What Sant Chat looks like from the inside. Voice, themes, and what would change.
The third post in the Sant Chat trilogy. Widget design, voice, Teach Mode, and the two things Sant would do differently if building from scratch today.
19 April 20267 min read
The first two posts in this series covered the RAG architecture that lets Sant Chat read a WordPress site before a visitor asks a question, and the six engineering decisions that make it safe to install on a production site. Both posts were about structure. This one is about the surface: what the widget looks like, what the voice interface is for, how Teach Mode works, and what the build would look like if it started today.
Retrospective is a different stance to architecture documentation. The architecture is what was decided. The retrospective is what the decisions cost.
The widget: control over the appearance, not just a colour picker
The design principle behind the Sant Chat widget is that the site owner should control the appearance of the chatbot without needing to write CSS. The execution is two presets, Classic and Playful, each with a set of controls that go well beyond a colour picker.
Classic is the structured option. Clean lines, contained layout, suited to professional services, healthcare, and any site where formality matters. Playful is warmer, with softer shapes and different animation behaviour, suited to retail, hospitality, and sites where the brand tone is more conversational. Both presets include a colour theme system, and within each theme there are more than eleven individual colour controls: header background and text, send button, close button, user message colours, bot message colours, the chat background, the input field, the bubble, and the powered-by line. The bubble itself accepts a custom icon image URL and a transparent toggle.
Beyond colour, the controls that ship today cover the greeting message and time-based variants (a different message in the morning versus the afternoon, for example), four quick-reply slots that appear before the visitor starts typing, proactive chat with a configurable delay and custom message text, lead capture in three modes with configurable trigger thresholds and custom prompt text for name, email, and phone fields, header action icons for phone, email, and conversation history, one fully custom header button with its own URL and tooltip, branding controls including the header title and an optional background image, a panel logo, and the powered-by line with a toggle and custom URL.
There are also global toggles for message timestamps, typing indicators, copy message, unread badge, message rating, action icons, session deletion, and inactivity timeout.
What is not configurable today is worth naming directly. The widget sits bottom right, twenty pixels from each edge. There is no position option, no offset control, no bottom-left variant. The bubble is circular. The shape is not configurable. The panel always starts closed. There is no open-by-default option, and trigger rules beyond the proactive delay, such as scroll depth or exit intent, have not shipped. These features are scheduled. They have not shipped because doing them half-formed would be worse than not doing them, and the current constraint set covers the vast majority of real site configurations. When they arrive, they will arrive with the same level of control the rest of the settings surface has.
Teach Mode: not a document library, not a rule engine
Teach Mode is one of the features most likely to be misunderstood from its name alone. It is not a way to upload documents. It is not a hard rule that overrides the model. It is a high-priority answer layer that sits in front of the standard RAG pipeline.
A site owner writes a question and answer pair in the dashboard. The question can be up to five hundred characters; the answer up to two thousand. When that pair is saved, the question is embedded using the same text-embedding-3-small model the rest of the pipeline uses, and stored in a separate corrections table alongside its embedding.
On every incoming visitor message, the system first searches that corrections table using a cosine-similarity threshold of 0.75, which is strict. Only if nothing in the corrections table matches at that threshold does the message fall through to standard RAG retrieval, which uses a looser threshold of 0.3. When a correction matches, its answer is injected into the LLM context as a verified correct answer, and document retrieval is skipped entirely for that turn.
The model still phrases the response. It can adjust the tone. What it cannot do is contradict the authoritative answer the site owner has written. The practical description is: answers the site owner has written, surfaced by semantic similarity to what the visitor asked. Pro plans allow fifty corrections. Business allows two hundred.
The distinction matters for setting expectations correctly. Teach Mode is powerful for precise, high-stakes content: frequently misquoted facts, specific policies, answers where accuracy is non-negotiable. It is not a knowledge base loader, and it does not give the site owner control over every possible response the chatbot produces.
Voice: two years of the same client request
The voice interface in Sant Chat was not a product differentiation decision. It came from a recurring request.
In two years working as head of studio, the pattern appeared often enough to become a category: clients asking whether a website could answer questions by voice, and not simple questions like opening hours, but in-depth questions. A visitor describing a symptom and getting back the relevant information from the organisation's official clinical guides. A member asking a complex question about eligibility and receiving an accurate answer drawn from the organisation's own policy documents. A non-profit site visitor asking about services in their region and hearing a response sourced from the organisation's own content.
The common thread was that the knowledge already existed in the organisation's documents, and the site was failing to surface it in a form that matched how people actually wanted to ask. Text chat solves part of that problem. Voice solves more of it, especially for visitors who are not confident typists, who are on mobile, or who are asking something where speaking is simply faster than writing.
The voice interface in Sant Chat uses the same RAG pipeline as the text interface. The audio goes through speech-to-text, the resulting query hits the pipeline, and the response is returned as audio. The technical complexity is real: the additional endpoint surface, the base64 validation, the audio handling. It was worth the complexity because the use case that drove it was real, not hypothetical.
What would change in a rebuild
Two things, named specifically rather than vaguely.
The first is scoping for the WordPress.org submission requirements from the beginning. Sant Chat required a rewrite of approximately eight thousand lines of code because the submission standards of the official plugin repository were not treated as a first-class constraint when the original scope was written. They should have been. The WordPress.org reviewer requirements cover security, sanitisation, internationalisation, and code structure, and they are not negotiable if the goal is distribution through the official repository. Building without those constraints in scope and then retrofitting them is the expensive path. The rewrite happened. It produced a better product. It should not have been necessary.
That is precisely the point the Sant methodology makes about scope clarity before a build begins. The cost of unclear constraints is paid during the build, not before it. Eight thousand lines of rewritten code is a specific version of that cost.
The second is shipping a smaller version first. Sant Chat launched with RAG, voice, Teach Mode, lead capture, multiple themes, and proactive chat. Some of those features could have been a second release rather than the first. A skinnier initial release focused on RAG and basic lead capture would have reached the WordPress.org repository faster, accumulated real user feedback earlier, and provided a cleaner base for the features that followed. The impulse to ship the full vision is understandable. The discipline to hold features back until the core is validated is harder, and it is the right call more often than it feels like in the moment.
Both observations sit inside the same principle: constraints that are not named upfront become costs that are paid during the build.
Where the trilogy lands
Three posts have covered the RAG architecture, the security engineering, and the surface design and retrospective of Sant Chat. The through-line across all three is the same set of decisions that Sant applies to client work: name the constraints before the build starts, draw the data boundary clearly, treat inbound surfaces as inbound surfaces, and ship the right scope rather than the most ambitious one.
Sant Chat is a product Sant Limited built for its own use and released publicly. The decisions it embeds, and the costs it paid when it deviated from them, are the same decisions and costs that show up in client projects. That is the point of building in public rather than just talking about methodology.
If you run a WordPress site and want a chatbot that reads your content before it answers questions, install Sant Chat AI from the WordPress.org plugin directory. If you are building a product and want the same scope discipline applied before the first line of code is written, Sant Launch services is where that work begins.