PDF Tools

Extract text and manipulate PDF documents.

Overview

The PDF Tools connector is a built-in utility for working with PDF files inside a workflow. It pulls plain text out of documents (the foundation of any AI-driven invoice or contract pipeline), and provides page-level operations for merging, splitting, extracting, rotating, and inspecting PDFs.

It's most useful in document-processing pipelines: pair extract-text with an AI Transform node to turn invoices into structured JSON, use split to break a multi-document batch into one workflow run per file, or use merge to assemble a shipping label with a packing slip into a single document for printing.

What You Can Do

The PDF connector exposes these tools:

extract-text - Pull plain text from a PDF, page by page.
get-info - Read document metadata (page count, title, author, encryption status).
merge - Combine multiple PDFs into one document.
split - Split a PDF into separate documents at given page boundaries.
extract-pages - Pull out a range of pages as a new PDF.
remove-pages - Delete a range of pages from a PDF.
rotate - Rotate one or more pages by 90, 180, or 270 degrees.

Authentication and Setup

No connection or authentication is required. These tools are built into the platform and available in every workflow by default - just drop a Connector node onto the canvas and pick the tool you need.

Using in a Workflow

Add a Connector node, select PDF Tools, and pick a mode:

Direct Mode - Recommended for document pipelines. Call extract-text against a known file, then feed the result into a Transform or AI node.
Agent Mode - Useful when you want an AI agent to decide whether to extract text, split, or merge based on a prose instruction.

For batch processing, place the PDF node inside a Loop so each incoming file (from FTP, Gmail Trigger, or a knowledge collection) gets its own extraction step.

Tips

Always pair extract-text with an AI step for structured extraction. The text is rarely useful on its own, but it's exactly the input AI invoice and contract parsers need.
Use get-info as a guard before processing - skip encrypted PDFs or files over a sensible page limit.
Split before extracting on very large PDFs. Per-page extraction keeps prompts inside an AI model's context window.
Merge late, not early - keep intermediate documents separate while you process them, then combine at the very end if a single artifact is needed.

Common Pitfalls

Scanned PDFs have no text - extract-text returns image-only pages as empty. OCR isn't included; use an AI vision model or an OCR connector for scans.
Layout-sensitive extraction - Multi-column documents and tables lose their structure in plain text. Use an AI step (Structured Output mode) rather than regex to recover fields.
Encrypted PDFs - Password-protected files can't be read. Decrypt upstream (e.g. in a Code Runner step) or reject them in a Condition node.
Page indexing - Pages are 1-based. Off-by-one errors when calling extract-pages or remove-pages are common.
File size limits - Very large PDFs (hundreds of MB) may exceed the workflow payload limit; split them on the source side or stream from FTP.

Common Use Cases

For technical API details and field specifications, see the PDF Tools documentation.