HTTP and Knowledge: Web Page Indexing Template

Post a list of URLs to a webhook and Spojit embeds each web page into a searchable knowledge collection within minutes.

What It Builds

A Webhook trigger receives a JSON payload containing a list of page URLs. A Loop node iterates the list, and for each URL a Knowledge node in Embed mode fetches the page using the http connector and stores it in a persistent collection as the Web Page URL document type. The result is a reusable collection your other workflows can query the moment indexing finishes.

The Prompt

Paste this into Miraxa and it builds the workflow, connecting the tools for you:

Build a workflow that triggers on a webhook receiving a JSON list of page URLs. Loop over each URL and use a Knowledge node in Embed mode to fetch the web page over HTTP and embed it as the Web Page URL document type into a persistent collection called "site-pages". Return a confirmation with the number of pages indexed.

Connectors Used

Webhook trigger - accepts the posted list of URLs that starts the run.
http - fetches each web page so the Knowledge node can read it.
Knowledge (Embed mode) - stores each page in a persistent, searchable collection.

Customize It

In the prompt, change the collection name from site-pages to your own, switch the trigger to a Schedule if you want a recurring crawl of a fixed URL list, or have it tag each page with its source domain so you can filter queries later.

Tips

Use a persistent collection (not Transient) so the indexed pages stay searchable across future runs.
Pages must be publicly reachable: the http connector fetches them without login, so gate any private pages behind an auth header in the prompt.
Add an HMAC secret to the Webhook trigger so only trusted callers can post URLs.