LUMA · WHATSAPP DEMO synced 0s ago

TALK TO LUMA.

In Lesotho, community health workers text a ministry-licensed WhatsApp number from their phones — voice memo or text, in Sesotho or English. luma grounds every answer in the official protocols and logs the conversation to the dashboards. This page is the same backend, exposed through the browser so you can try it without a phone.

WHAT YOU'RE LOOKING AT

This is the same backend that powers a real CHW's WhatsApp.

Every message you send hits the same pipeline as a real worker texting from a clinic in Maseru: safety filter → RAG retrieval over ministry protocols → Claude → post-LLM safety check → ministry log. The only thing missing is Twilio — you're going through HTTP, they go through WhatsApp Business API.

Watch the Audit Log while you chat — your conversations show up there in real time, tagged as demo entries. Open the Ministry view to see how those interactions aggregate into operational dashboards. Or hit the API directly if you'd rather skip the browser.

CHANNEL
WhatsApp Business API, routed through Twilio. In production the number is registered to the partner ministry's account, not luma's.
CORPUS
11 ministry protocol documents indexed via OpenAI text-embedding-3-small, retrieved by cosine similarity. ART, TB, MNCH, PrEP, PMTCT, HIV testing, family planning, immunization, malnutrition, STI screening.
MODEL
Claude Sonnet 4.6 with a system prompt that grounds responses in retrieved chunks and refuses dosing / diagnostic / prescription requests regardless of phrasing.
VOICE
Real CHWs send voice memos in Sesotho. OpenAI Whisper transcribes; the rest of the pipeline is identical. (This demo is text-only.)
SAFETY
Two-layer filter: regex pre-check refuses anything that looks like a dosing or diagnostic request before the LLM is invoked; a post-LLM scrubber catches anything the model may have slipped through.
LOGGING
Every conversation lands in conversations + case_tags tables. Severity, district, topic, condition extracted by a second Claude call (Haiku) and shown in the Ministry and Public Health views.
RATE LIMIT
Demo capped at 8 messages per minute per IP so the page stays cheap to run. Real CHWs aren't rate-limited.
What happens between send and reply
  1. POST /api/demo-chat — your message hits Express
  2. Pre-LLM safety filter checks for dosing/diagnostic patterns
  3. OpenAI embeds your query, cosine-similarity over the corpus
  4. Top-3 chunks bundled into Claude's context
  5. Claude answers, grounded — or routes to a "general knowledge" fallback if the corpus didn't cover it
  6. Post-LLM scrubber catches any unsafe output
  7. Response logged with sources, severity tags extracted async