How to Summarize Long Documents with AI in Your Workflows
Automatically generate concise summaries of long documents, reports, or email threads.
What This Integration Does
Long PDFs, weekly reports, and noisy email threads eat hours of attention. This workflow takes any long-form text and produces a structured summary - a one-line headline, a few key bullets, and a list of action items - then drops it into Slack, email, or your project management tool so people get the value without reading the source.
The workflow accepts documents from any input (uploaded files, FTP drops, mailbox attachments, Knowledge Base hits) and produces a deterministic summary object on every run. Re-running on the same document produces a stable shape, so downstream automations that consume the summary don't break.
Prerequisites
- An AI Agent enabled in Spojit.
- A source of documents: an ftp connection, an Email trigger, or files indexed in a Knowledge Base.
- A destination connection (e.g. slack, resend, or monday) for the summary.
Step 1: Trigger on Incoming Documents
Add a Trigger node. For mailbox-driven flows pick the Email sub-type and filter by sender or subject. For a shared drop folder use a Schedule trigger and a Connector node calling the ftp connector's list-directory tool, then download-file for new entries.
Step 2: Extract the Text
If the document is a PDF, add a Connector node pointing at the pdf connector and use the extract-text tool. For very large documents, use extract-pages in a Loop node so each chunk fits comfortably in the model's context window.
Step 3: Chunk if Needed
For anything over ~50 pages, add a Transform node that splits the text into chunks of around 4,000 tokens, with a small overlap (a few sentences) so context isn't lost at boundaries. Then iterate the AI step with a Loop node.
Step 4: Summarize with the AI Agent
Add a Connector node in Agent Mode. Use Structured Output so you get back a consistent shape:
Summarize the document below. Be specific - prefer facts and numbers over
adjectives. Do not invent details not present in the source.
Document:
{{ chunk.text }}
Schema:
{
"headline": { "type": "string" },
"summary": { "type": "string", "description": "3-5 sentence executive summary" },
"keyPoints": { "type": "array", "items": { "type": "string" } },
"actionItems": { "type": "array", "items": { "type": "string" } },
"deadlines": { "type": "array", "items": { "type": "string" } }
}
Step 5: Merge Chunked Summaries
If you split the document, add a second AI Agent step that takes the array of chunk summaries and produces one final summary. This map-reduce pattern handles documents of any length without truncation.
Step 6: Distribute the Summary
Branch with a Parallel node so distribution is fast:
- slack
send-message- drop the headline and bullets into a team channel. - resend
send-email- email it to stakeholders. - monday
create-item- create a task for each entry inactionItemsusing a Loop.
Tips
- Cap chunk size by tokens, not characters - characters vary wildly in token cost across languages.
- Lead the prompt with the role ("You are a chief of staff summarizing for the CEO") to get the tone you want without burning tokens.
- For recurring report types, pre-bake the schema and prompt as a Subworkflow so other workflows can reuse it.
Common Pitfalls
- Hallucinated action items - tell the model explicitly to leave
actionItemsempty if none are mentioned, otherwise it will invent some. - Scanned PDFs -
extract-texton an image-only PDF returns nothing. Detect empty output and route to an OCR step before summarizing. - Stale summaries - if you re-summarize the same document, dedupe in your destination (e.g. include a hash of the source text in the Monday item key) so you don't post the same summary twice.
Testing
Run the workflow manually against one short document and one long, multi-section one. Confirm the chunked path produces a coherent final summary and not a list of disjointed mini-summaries. Then enable the trigger.