LUMA · METHODOLOGY

HOW THE PROJECTIONS WORK.

luma's customer-facing dashboards report district- and country-level health burden estimates. This page documents the math, the priors, and the honest limits of what the current build can claim.

What we do

luma's CHW interactions generate primary data: real questions, real referrals, real protocol matches. That data is sparse on its own. To produce useful district- and country-level estimates, we combine it with published priors — UNAIDS/WHO/PEPFAR estimates for Lesotho — using a Bayesian update.

The result is a posterior estimate that:

Honest caveat. At current scale — early Lesotho deployment, a modest number of weekly CHW interactions — the posteriors are dominated by the priors. The customer dashboards display this transparently via the "confidence" indicator on every projection card. The framework is real; the indicators become operationally tight as the CHW network expands toward national coverage.

The math (Beta-Binomial)

For a proportion (e.g., HIV prevalence among adults), we use a Beta-Binomial conjugate update. This is the standard Bayesian textbook approach.

Prior

The published Lesotho HIV prevalence is 22.8% (UNAIDS 2024). We turn this into a Beta(α, β) prior with a chosen "effective sample size" that controls how strongly the prior anchors the posterior:

α_prior = prior_mean × effective_n
β_prior = (1 - prior_mean) × effective_n

For HIV we use effective_n = 200 — a strong prior, since UNAIDS estimates are based on census-scale household surveys. We don't want one CHW conversation to swing a country-level estimate.

Likelihood

Each CHW interaction gets tagged with a topic (HIV, TB, MNCH, etc.) by an LLM-based extractor. We treat the count of HIV-related interactions (k) over total interactions (n) as a Binomial sample.

Posterior

α_post = α_prior + k
β_post = β_prior + (n - k)
posterior_mean = α_post / (α_post + β_post)
variance = (α_post × β_post) / ((α_post + β_post)² × (α_post + β_post + 1))
sd = √variance
95% CI ≈ posterior_mean ± 1.96 × sd

We use a normal approximation to the Beta CI, which is good for moderate sample sizes. For very small posteriors we widen the interval intentionally.

For incidence rates (e.g., TB)

Rates per 100,000 don't fit cleanly into a Beta-Binomial. We use a simpler weighted average:

posterior_rate = (1 - λ) × prior_rate + λ × observed_rate
λ = function of sample size (small n → λ near 0)

Currently we fix λ = 0.3. This becomes sample-size-adaptive as primary data accumulates from active districts.

Priors used

Source years documented in src/projections.js. Replace with current values as new data is published.

TopicPriorSource
HIV adult prevalence22.8%UNAIDS 2024
TB incidence650 / 100k / yrWHO Global TB Report 2024
Maternal mortality540 / 100k live birthsWHO GHO 2024
Under-5 mortality91 / 1000WHO GHO 2024
Stunting (under-5)32%UNICEF JME 2024
Wasting (under-5)2.7%UNICEF JME 2024
Modern contraceptive use62%UN Population Division 2024

What the current build CAN and CANNOT support

Can

Cannot (yet)

Where this goes (production scale)

Once luma reaches full operational scale (national CHW coverage across multiple districts), the same framework produces:

None of those rely on the current build's small dataset. They rely on the framework being correct.