How to Process Emailed Patient Registration Forms with AI Extraction

Turn registration and intake form PDFs that customers email or your front desk forwards into clean, structured records your admin team can file, without anyone retyping a single field.

What This Integration Does

Pharmacies, clinics, and health retail front desks collect a steady stream of new-customer registration and intake forms as PDFs: scanned paper, e-signed documents, or forms a customer fills in and emails back. Re-keying the name, contact details, and pickup or communication preferences from each one is slow and error prone. This Spojit workflow receives the form by email, reads it, and proposes a tidy record for a staff member to confirm before it lands in your admin database. It is strictly an admin intake task: the workflow captures registration and contact details only, and performs no clinical interpretation of anything on the form.

The run model is push based. A Mailhook trigger gives you a dedicated email address; the moment a form arrives, the workflow starts. An Attachment node pulls the PDF bytes, a Knowledge node embeds that one document into a Transient collection and a second Knowledge node in Query mode with a Response Schema returns the fields as predictable JSON. A Human approval node pauses so a staff member can eyeball the proposed record, and only on approval does the mongodb connector file it with insert-documents. Each email is deduplicated, so the same forwarded form does not create two records, and the Transient collection is discarded automatically when the run ends.

Prerequisites

  • A workflow in the Spojit Designer with the trigger set to Mailhook (no mailbox or OAuth needed).
  • A mongodb connection (added under Connections) pointing at the database where you keep admin intake records, for example a registrations collection.
  • Optionally a slack connection if you want to notify the front-desk channel once a record is filed.
  • A list of approver users, roles, or teams for the registration desk, so the Human node has someone to route to.
  • A sample registration PDF you can email in for testing.
  • Knowledge of which fields you want captured (for example full name, email, phone, postal address, and contact or pickup preferences). Keep the scope to registration and contact details only.

Step 1: Receive the form with a Mailhook trigger

Open the Trigger node and set Trigger Type to Mailhook. Enter an Address prefix such as intake (1 to 24 characters), then click Generate email address to get a unique address like intake-3f9a2c7b1d6e0a4f@mailhook.spojit.com. Copy it and have your front desk forward registration forms there, or set an inbox forwarding rule so any mail to your registrations@ address is forwarded on.

To reduce noise, add a From allowlist (for example your own staff domain) and an optional Subject regex like (?i)registration|intake. The trigger fires whether the address is in To, Cc, or Bcc, and messages are deduplicated per message so a re-forwarded email does not run twice. The incoming email is available downstream as {{ input }}, including {{ input.subject }}, {{ input.from }}, {{ input.replyTo }}, and the {{ input.attachments }} references.

Step 2: Fetch the PDF with an Attachment node

Add an Attachment node directly after the trigger. The Designer only allows this node when the workflow uses a Mailhook trigger. Set Mode to Single so you get the first matching file as an object, add a Content type filter of application/pdf, and a Filename pattern of *.pdf to ignore signature images or logos in the email body. Turn on Fail if no attachment matches so a form-less email stops cleanly instead of producing an empty record.

The node outputs an object you can reference as the attachment result, with filename, contentType, size, and content (the raw bytes as base64). You will feed content straight into the next step. Attachments are limited to 10 MB each and 25 MB per run by default, which is comfortably above a typical scanned form.

Step 3: Embed the form into a Transient Knowledge collection

Add a Knowledge node set to Embed mode. In the Collection dropdown, pick Transient. A Transient collection is created just for this run, is shared with later nodes in the same run, and is cleaned up automatically when the run finishes, which is exactly what you want for one-off "embed then query then discard" extraction. Because it is transient, no file name or embedding model selection is required.

Set Document Type to PDF. For Document Input, reference the base64 bytes from the previous step, for example {{ attachment.content }} (use the variable name your Attachment node writes to). If your forms are scanned images saved as PDFs, the document is read via OCR so the text still becomes searchable. The node writes a chunk count and metadata to its Output Variable; you do not need to read it, but it confirms the document embedded successfully.

Step 4: Extract structured fields with a Knowledge Query and a Response Schema

Add a second Knowledge node set to Query mode and set its Collection to Transient so it reads the document you embedded moments earlier in the same run. Query mode runs your Prompt against the collection and synthesizes an answer with the chosen Model; pair it with a Response Schema so the output is reliable JSON your database step can map directly. Write a focused prompt that pulls only registration and contact details:

From the registration form in context, extract only the customer's
administrative registration and contact details. Do not interpret,
summarize, or infer anything clinical. If a field is missing, return
null for it.

Attach a Response Schema that pins the shape of the record:

{
  "type": "object",
  "properties": {
    "fullName":   { "type": "string" },
    "email":      { "type": ["string", "null"] },
    "phone":      { "type": ["string", "null"] },
    "address":    { "type": ["string", "null"] },
    "preferences": {
      "type": "object",
      "properties": {
        "contactMethod":  { "type": ["string", "null"] },
        "pickupReminders": { "type": ["boolean", "null"] }
      }
    }
  },
  "required": ["fullName"]
}

Set a low Result Count (the default of 5 chunks is plenty for a single short form) and point the node's Output Variable at a name like extraction. You end up with a clean object such as {{ extraction.fullName }} and {{ extraction.email }} for the next steps. Keep the prompt narrow so the workflow never strays beyond admin intake.

Step 5: Pause for staff confirmation with a Human node

Add a Human node so a front-desk staff member reviews the proposed record before anything is filed. Set a clear Label like Confirm new registration and a Message that surfaces the extracted values, for example:

New registration from {{ input.from }}.
Name: {{ extraction.fullName }}
Email: {{ extraction.email }}
Phone: {{ extraction.phone }}
Preferences: {{ extraction.preferences.contactMethod }}
Approve to file this record.

Add at least one Approval slot and assign your registration-desk users, a role, or a team as atoms; any atom in a slot satisfies it, and approval completes when every slot is satisfied. Set a Timeout in minutes if you want stale requests to lapse, and optionally turn on Email approvers so the first reviewers get an email. Approvers respond in the Approvals inbox. On approval the node outputs { approved: true, outcome: "APPROVED" } and the workflow continues; a rejection or timeout halts the run so nothing is filed.

Step 6: File the record with the mongodb connector

Add a Connector node in Direct mode on the mongodb connector and choose the insert-documents tool. Point it at your intake database and a collection such as registrations, and map the document from the approved extraction plus a little provenance from the email:

{
  "fullName": "{{ extraction.fullName }}",
  "email": "{{ extraction.email }}",
  "phone": "{{ extraction.phone }}",
  "address": "{{ extraction.address }}",
  "preferences": {
    "contactMethod": "{{ extraction.preferences.contactMethod }}",
    "pickupReminders": "{{ extraction.preferences.pickupReminders }}"
  },
  "source": "mailhook-registration",
  "receivedFrom": "{{ input.from }}",
  "receivedAt": "{{ input.receivedAt }}"
}

Storing receivedFrom and receivedAt gives your team an audit trail back to the original email. If you want to skip duplicates, you can add a Connector node with find-documents on the same email before the Human node and route past the insert with a Condition node when a match already exists.

Step 7: Notify the front desk (optional)

To let the team know a record was filed, add a Connector node on the slack connector with the send-message tool, posting to your front-desk channel: New registration filed for {{ extraction.fullName }}. If you would rather reply to the person who sent the form, use a Send Email node addressed to {{ input.replyTo }} confirming the registration was received. Mailhook runs are always asynchronous and send no automatic reply to the sender, so a Send Email step is the way to acknowledge them.

Tips

  • Keep the extraction prompt and Response Schema strictly to registration and contact fields. A tight required list and explicit null handling keep the AI output predictable and on-task.
  • Use the Transient collection rather than a persistent one for single-form extraction: it is auto-created and auto-cleaned per run, so no leftover documents accumulate.
  • For runs where someone forwards several forms in one email, switch the Attachment node to Multiple mode and wrap Steps 3 to 6 in a Loop over {{ attachment.attachments }}.
  • Set the Human node Urgency to Normal for routine intake and reserve High for forms that came in flagged as urgent, so approvers can triage their inbox.

Common Pitfalls

  • The Designer will not save an Attachment node unless the trigger is a Mailhook. If you started from an Email trigger, switch the trigger type first.
  • If extraction comes back empty, the email likely had no PDF or the file exceeded the 10 MB per-attachment limit. Keep Fail if no attachment matches on so these stop cleanly instead of filing blank records.
  • A rejected or timed-out Human approval halts the workflow; there is no "on reject do X" branch. If you need a follow-up on rejection, handle it before the approval or in a separate workflow.
  • Embedding and querying must use the same collection within the run. Always pick Transient in both the Embed and the Query step so the query can see the document embedded moments earlier.
  • Mailhook deduplicates per message, but if your forwarding rule rewrites the message it can look new. Keep an idempotency check with find-documents on email if duplicate forwards are common.

Testing

Before turning this on for live intake, point the Mailhook address at a test inbox only you control and email yourself one sample registration PDF. Watch the run in the execution history: confirm the Attachment node fetched the file, the Knowledge node embedded it, and the extraction produced the expected fields. Approve the Human step yourself and verify a single tidy document landed in your registrations collection with the right receivedFrom and receivedAt. Re-send the same email to confirm it is deduplicated and does not create a second record. Once a few sample forms file cleanly, widen the From allowlist to your real intake sources.

Learn More

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.