Ingesting a 50 page WordPress site end to end, summarised, chunked, and embedded ready for retrieval, costs Sant Chat roughly three US cents. A 200 page site, which is the hard cap in the current ingest pipeline, costs roughly eleven US cents. A single chat turn, once the site is ingested, costs a fraction of a cent. These are real numbers from the production pipeline, not estimates.

The numbers are worth stating up front because they shape the architecture. Sant Chat was designed around the assumption that a WordPress site worth reading is a WordPress site worth reading in full, and reading in full has to be cheap enough that cost becomes irrelevant to the pricing conversation. The architecture that gets there is not the standard RAG recipe. The difference is the order of operations.

Most RAG pipelines chunk raw content, embed the chunks, and retrieve at query time. Sant Chat summarises each page first, then chunks the summary, then embeds. The summarise step runs on gpt-4o-mini. The embed step runs on text-embedding-3-small. The store is Supabase pgvector. Retrieval is a Postgres RPC call using cosine distance. This post walks through the decisions behind that pipeline, what they cost, and where they pay back.

Why a sitemap is the right entry point for WordPress

The first question any RAG product has to answer is what to read. Sant Chat reads the sitemap.

Sitemaps are canonical. A reasonably configured WordPress site publishes one, usually at /sitemap.xml, and it reflects what the site owner wants discoverable. Scraping the whole site is the alternative and it is worse in every way. Scraping is aggressive, brittle against theme changes, and it picks up pages the site owner never intended to expose, which is the opposite of what a customer facing chatbot should do.

Using the sitemap is also a respect signal. When a WordPress site marks a page as noindex or excludes it from the sitemap, the site owner has made a decision about visibility. A chatbot that reads the sitemap inherits that decision. A scraper ignores it.

0

0

The Full Picture

Work

About

Plans

How Sant Chat AI reads a WordPress site

Why a sitemap is the right entry point for WordPress

Interested in working with us? Start a conversation now.

Summarise before you embed, not after

Chunk the summary, not the raw page

The vector store is the database

Retrieval is a single RPC and a cosine distance

Hash based change detection for incremental sync

Where this fits in Sant Launch services

Frequently asked questions

0

0

How Sant Chat AI reads a WordPress site

Why a sitemap is the right entry point for WordPress

Related Posts

What Sant Tabs collects and what it refuses to. The complete picture.

Which MCP servers actually earn their place in a delivery workflow.

What Sant Chat looks like from the inside. Voice, themes, and what would change.

Interested in working with us? Start a conversation now.

Summarise before you embed, not after

Chunk the summary, not the raw page

The vector store is the database

Retrieval is a single RPC and a cosine distance

Hash based change detection for incremental sync

Where this fits in Sant Launch services

Frequently asked questions