HTTP and Knowledge: Web Page Indexing Template

Post a list of URLs to a webhook and Spojit embeds each web page into a searchable knowledge collection within minutes.

What It Builds

A Webhook trigger receives a JSON payload containing a list of page URLs. A Loop node iterates the list, and for each URL a Knowledge node in Embed mode fetches the page using the http connector and stores it in a persistent collection as the Web Page URL document type. The result is a reusable collection your other workflows can query the moment indexing finishes.

The Prompt

Paste this into Miraxa and it builds the workflow, connecting the tools for you:

Build a workflow that triggers on a webhook receiving a JSON list of page URLs. Loop over each URL and use a Knowledge node in Embed mode to fetch the web page over HTTP and embed it as the Web Page URL document type into a persistent collection called "site-pages". Return a confirmation with the number of pages indexed.

Connectors Used

  • Webhook trigger - accepts the posted list of URLs that starts the run.
  • http - fetches each web page so the Knowledge node can read it.
  • Knowledge (Embed mode) - stores each page in a persistent, searchable collection.

Customize It

In the prompt, change the collection name from site-pages to your own, switch the trigger to a Schedule if you want a recurring crawl of a fixed URL list, or have it tag each page with its source domain so you can filter queries later.

Tips

  • Use a persistent collection (not Transient) so the indexed pages stay searchable across future runs.
  • Pages must be publicly reachable: the http connector fetches them without login, so gate any private pages behind an auth header in the prompt.
  • Add an HMAC secret to the Webhook trigger so only trusted callers can post URLs.

Related

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.