Rare Disease Diagnosis: The Knowledge That's Already There

Courtney's son Alex had been in pain for three years. It started during COVID — a toothache that wouldn't go away. Then came the meltdowns, the chewing on everything, the growth that just... stopped. By age four, he'd seen seventeen doctors.

The dentist found no cavities. The pediatrician blamed "pandemic effects" and recommended PT. The orthodontist fitted a palate expander. The neurologist diagnosed migraines. Nobody connected the dots.

Here's the thing that gets me: Alex couldn't sit cross-legged. To most doctors, that's a minor behavioral quirk. To someone who knows about spinal cord tethering, it's a red flag — the position puts tension on the neural tube.

Courtney had been keeping a binder. Thirty-four months of symptoms, test results, observations. One night in 2023, frustrated and desperate, she did something that would have been impossible two years earlier: she opened ChatGPT and started typing.

She went line by line through Alex's MRI notes. She added every symptom — the growth arrest, the headaches, the gait issues, the inability to sit cross-legged. She asked the AI what could explain all of this together.

ChatGPT suggested tethered cord syndrome.

Courtney joined a Facebook group for tethered cord families. The stories matched. She brought the hypothesis to a new neurosurgeon. The doctor pulled up Alex's MRI — the same MRI that had been sitting in his file — and pointed: "Here's occulta spina bifida, and here's where the spine is tethered."

Two weeks later, Alex had surgery. He's thriving now.

The knowledge existed. The MRI existed. The pattern was in papers. Seventeen doctors saw pieces of the puzzle. The system had no mechanism to connect "this specific constellation of features" with "this rare condition" — until a desperate mother, a symptom binder, and a chatbot did what the healthcare system couldn't.

Alex got lucky. His mother was persistent, technically capable, and arrived at exactly the moment when a consumer AI could synthesise medical information. For every Alex, thousands remain undiagnosed — cycling through specialists, accumulating tests, burning through years. Here's why the system didn't work for Courtney, and why it should have.

This research led to a tool

The investigation below led to Zebra Scout — a symptom-matching prototype that maps what you're experiencing to rare conditions using the Human Phenotype Ontology. No AI, no data sent anywhere, fully verifiable via Orphanet. It's small (65 conditions), but it works. Open Zebra Scout → (Source Code)

The Numbers

📚Diagnostic Odyssey Statistics

7,000+ rare diseases identified
400 million people affected globally (rare diseases are collectively common)
4.7 years mean time to diagnosis
73% received at least one incorrect diagnosis
60% had symptoms initially dismissed as psychological
4-8 specialists seen during diagnostic odyssey

The cruel mathematics: each rare disease is rare, but a GP will encounter patients with some rare disease regularly. They just won't recognise them, because no individual condition appears often enough to learn.

This Pattern Isn't New

Courtney's story happened in 2023. The same pattern played out a decade earlier — and what happened next reveals something important about where solutions might come from.

In 2007, Bertrand Might was born with symptoms that "landed him in the empty set." Jiggly, unable to control his movements, hitting none of his developmental milestones. No known condition matched. His parents — Matt Might (a computer scientist) and Cristina Might — spent four years seeing specialists at Duke, NIH, and Cleveland Clinic before enrolling him in a pilot whole-exome sequencing study.

The diagnosis: a mutation in the NGLY1 gene. Bertrand was the first human ever diagnosed with this condition.

Then Matt did something that changed what was possible. He published a blog post titled "Hunting Down My Son's Killer." It was explicitly designed as "a Google dragnet" — optimised to be found by other parents searching for the same constellation of symptoms.

Within 24 hours, the post went viral. Within 13 months, nine more NGLY1 patients appeared worldwide — families who had been searching alone, now connected by a blog post indexed by Google.

Matt Might is now a professor at Harvard Medical School. The family founded NGLY1.org and recruited researchers to study the condition. The mutation that had existed for years, undiagnosed because no one had ever connected the symptoms to a cause, now has a patient community, a research program, and ongoing clinical trials.

The pattern: The infrastructure the medical system couldn't build — connecting "this specific phenotype" to "this specific mutation" — was built by a desperate parent with a blog. The system failed Matt Might in 2011 the same way it failed Courtney in 2023. What hasn't changed is the core problem: the knowledge exists, but nothing connects it to the patient at the moment it matters.

Why This Is Overhang, Not Impossibility

The diagnostic gap isn't about missing science. It's about knowledge distribution.

What exists:

Comprehensive databases of rare disease phenotypes
Symptom-to-diagnosis matching tools
Genetic testing that can identify thousands of conditions from a single sample
Facial analysis AI that can spot dysmorphic features
Case report literature documenting nearly every presentation

What's stuck:

These tools aren't integrated into primary care workflows
GPs don't know they exist, don't trust them, or don't have time to use them
Genetic testing is ordered late, if at all
The patient's complete symptom picture is scattered across multiple specialists' notes

This is classic overhang: the capability to dramatically shorten diagnostic odysseys exists. It's not being deployed where patients first present.

This kind of overhang has been solved before.

Historical Precedent: The Gram Stain

Before 1884, identifying bacteria required growing cultures — days of waiting while patients deteriorated. Hans Christian Gram's staining technique allowed immediate classification of bacteria into two major groups. Treatment decisions that took days could now be made in minutes.

The pattern: A simple test that doesn't identify everything, but narrows the search space dramatically and can be done at point of care.

Historical Precedent: Newborn Screening

Newborn screening started with one condition (PKU) in the 1960s. Today, most developed countries screen for 35 core conditions from a single blood spot, before symptoms appear.

The pattern: What was once diagnosed late and badly is now caught universally and early — because the test was integrated into a routine touchpoint that every baby passes through.

These precedents suggest a question: what would a "Gram stain for rare disease" look like? What test could narrow the search space at the moment patients first present? The answer isn't obvious — and understanding why requires looking at where current solutions fail.

Why Solutions Fail

The research reveals three failure modes, but they're not equal. Phenotype capture failure is the root cause; everything else is consequence.

1. The Phenotype Capture Problem (Root Cause)

Diagnostic algorithms need accurate input. But the patient's complete phenotype — the full constellation of symptoms, features, and history — is never assembled.

The dentist used ICD-10 codes for "unspecified dental pain." There is no code for "cannot sit cross-legged due to neurological tension." And so the pattern dissolved into the gaps between billing categories.

The patient's phenotype is:

Scattered across multiple providers' notes
Documented in inconsistent terminology (patient language → ICD codes → HPO terms)
Missing features the patient didn't mention or the doctor didn't ask about
Lost in translation at every referral handoff

The Human Phenotype Ontology (HPO) has 18,000+ terms for precise clinical features. It's rarely used outside genetics departments. Meanwhile, the algorithms that could match symptoms to rare diseases — FindZebra, Isabel Healthcare, Face2Gene — sit idle because their input is garbage.

The pattern: Garbage in, garbage out. Even accurate algorithms fail on incomplete symptom lists. And no algorithm can work on symptoms that were never recorded.

2. System Fragmentation (Why Good Tools Fail Anyway)

Even when the phenotype is captured, the system fragments it.

Specialists see slices: The neurologist rules out neurological causes. The cardiologist rules out cardiac causes. Each evaluates for conditions in their domain and refers elsewhere. Nobody holds the whole picture. The rare disease that spans multiple systems falls through the gaps between specialties.

Clinicians ignore alerts: Isabel Healthcare represents everything a clinical decision support tool should be — accurate, integrated, well-funded. A UK NHS pilot found 16 uses across 7 practices in 3 months. Clinicians override decision support alerts constantly — not because they're arrogant, but because they're drowning. Primary care clinicians receive 56+ alerts per day. Any rare disease prompt competes with medication interactions, documentation reminders, and billing flags.

AI threatens identity: A computer saying "consider Fabry disease" triggers defensiveness, not curiosity. Tools positioned as "AI diagnosis" threaten professional identity. Tools positioned as "literature search that already happened" might not — but even then, who has time to read the output?

The pattern: Tools that add steps fail. Tools that require leaving the EHR fail. Tools that generate alerts get ignored. The system is optimised for throughput, not synthesis.

3. The Genetic Testing Bottleneck

Whole exome/genome sequencing can identify thousands of conditions, but it comes too late:

It's ordered after years of negative workup, not early
Results take weeks to months
Interpretation requires expertise that's scarce
"Variant of uncertain significance" results create new uncertainty
Even with testing, ~50% never receive a definitive molecular diagnosis

The cost is dropping — from $10,000 to under $500 — but the interpretation bottleneck remains. And sequencing can only find what's in the patient; it can't compensate for a phenotype that was never properly captured.

What's Actually Working

Undiagnosed Diseases Programs

The NIH Undiagnosed Diseases Program (UDP) and its international network take patients who've exhausted conventional diagnostics. They apply comprehensive phenotyping, advanced genomics, and multidisciplinary review.

Results: ~25-35% diagnosis rate for patients who've been searching for years.

The limitation: Capacity. These programs demonstrate what's possible but can't scale to meet demand.

Face2Gene and Facial Analysis

FDNA's Face2Gene uses facial analysis to suggest genetic syndromes based on photographs. Clinicians upload a photo; the system suggests conditions to consider.

What's notable: Studies show it suggests the correct diagnosis in its top-10 list for many dysmorphic syndromes. It's being used by geneticists.

The limitation: Only helps for conditions with facial features. Requires the clinician to think "this might be genetic" and seek out the tool.

Patient Communities and Crowdsourced Diagnosis

Platforms like RareConnect and disease-specific Facebook groups let patients compare notes. Sometimes patients find their own diagnosis by recognising their symptoms in others' stories.

What's notable: Desperate patients are doing distributed pattern matching that the healthcare system won't do. Matt Might's blog-as-search-engine (discussed above) is the most dramatic example, but it's not unique — patient communities routinely surface diagnostic leads that specialists miss. Courtney found confirmation for Alex's diagnosis in a Facebook group before she found a surgeon who agreed.

Two Models for Patient-Driven Diagnosis

The failure analysis points away from clinician-facing tools and toward patient empowerment. But "patient empowerment" could mean two very different things:

The Tool Model: Build software that captures symptoms in plain language, maps them to HPO terms, and generates structured summaries for clinicians. The patient arrives with a "phenotype passport" that speaks the same language as diagnostic algorithms.

The Network Model: Do what Matt Might did — create content designed to be found by other desperate patients, build community, connect families who share rare phenotypes, and let the pattern-matching happen through human connection rather than algorithm.

These aren't mutually exclusive. Matt Might's blog worked because it was SEO-optimised content; a software tool could include community features. But they require different skills. The tool model needs engineering. The network model needs writing, curation, and community management.

The honest assessment: Matt Might's blog accomplished more for NGLY1 patients than any phenotype tool has accomplished for anyone. But blogs don't scale the way software does. Both approaches deserve serious consideration.

Phenotype Passport

Patient-prepared symptom documentation

💻 Software

Patients know their symptoms but can't communicate them in terms that match diagnostic databases. The complete phenotype is scattered across providers, documented in inconsistent terminology, and loses fidelity at every referral. Algorithms fail because their input is garbage.

Diagnostic Journey Timeline

Structured history for pattern recognition

💻 Software

Patients who've been searching for years have seen dozens of providers, had hundreds of tests, and accumulated thousands of pages of medical records. No single view exists. Each new specialist starts from scratch.

⚠️What Won't Work (And Why)

Rare Flag (EHR Extension): The skeleton draft proposed a CDS Hooks extension that would trigger rare disease consideration when specific symptom combinations appear. The research invalidates this approach.

Clinicians override 98% of CDS alerts (Brigham study)
Isabel Healthcare failed despite 87% accuracy and full EHR integration
Alert fatigue is the binding constraint, not algorithm quality

Genetic Interpretation Tools: Requires regulatory compliance, clinical expertise, and liability frameworks that are out of reach for solo development. And there's a reason for that: genetic misdiagnosis has consequences. The history of direct-to-consumer BRCA testing includes cases of prophylactic mastectomies based on false positives. "Appropriately gated" is frustrating for technologists, but it's not irrational. The regulations exist because people were harmed.

Reality Checks

These are the people or experiences that could quickly validate or invalidate this analysis:

A genetic counselor at a major academic medical center — How do current diagnostic tools actually perform in practice? What's the real bottleneck?
A parent who completed a diagnostic odyssey — Does the failure taxonomy match lived experience? What actually helped?
A GP or pediatrician in community practice — Would patient-prepared phenotype summaries help or create noise?
Someone from NORD or Global Genes — What interventions have been tried? Why haven't they scaled?
A researcher at the NIH Undiagnosed Diseases Program — What do they know about why cases go undiagnosed that isn't in the literature?

If you fit any of these profiles and think I've got something wrong, I'd like to know.

Resources

📚Case Study Sources

Alex Hofmann / Tethered Cord Syndrome:

Radiology Business coverage — Original reporting with direct quotes from Courtney (mother)
Story widely reported across medical AI conferences and healthcare media (2023)

Bertrand Might / NGLY1 Deficiency:

Matt Might's original blog post: "Hunting Down My Son's Killer" — Primary source
Corroborated by NORD, STAT News, NBC News
Matt Might is now Professor at Harvard Medical School; family founded NGLY1.org

Face2Gene / Keegan Battavio:

CBS/KYMA News coverage (February 2022) — Named patient, named physician (Dr. Karen Gripp, FDNA CMO)

🔗Key Organizations

Patient Advocacy:

NORD — National Organization for Rare Disorders (US)
EURORDIS — European Organisation for Rare Diseases
Global Genes — Rare disease patient advocacy

Research Programs:

NIH Undiagnosed Diseases Program — Takes patients who've exhausted conventional diagnostics
Undiagnosed Diseases Network — Multi-site US research network

🛠️Databases & Tools

Phenotype Databases:

OMIM — Online Mendelian Inheritance in Man (16,000+ genes, 8,000+ phenotypes)
Orphanet — 6,000+ rare diseases with clinical descriptions
HPO — Human Phenotype Ontology (18,000+ standardised terms)
GARD — NIH Genetic and Rare Diseases Information Center

Diagnostic Tools:

Face2Gene — FDNA's facial analysis AI (90% top-10 accuracy for dysmorphic syndromes)
FindZebra — Rare disease search engine (87% accuracy in research settings)
Isabel Healthcare — Diagnostic decision support (25+ years development)
Exomiser — Phenotype-driven variant prioritisation

📊Statistics Sources

Diagnostic odyssey statistics: EURORDIS Rare Barometer 2024
Face2Gene accuracy: FDNA clinical validation studies; 70% of world's geneticists use Face2Gene across 2,000 clinical sites in 130 countries
Isabel Healthcare workflow failure: UK NHS pilot data; Brigham and Women's Hospital study on CDS alert override rates (98%)

What This Doesn't Address

Treatment availability: Diagnosis is only valuable if treatment exists. Many rare diseases have no approved therapy.
False hope: A diagnosis isn't always good news. Some rare diseases are progressive and fatal.
Healthcare system variation: This analysis assumes a developed-world context. The bottlenecks are different where basic healthcare is unavailable.

What I'm Doing Next

I'm mapping multiple problem spaces before committing to building anything. This is reconnaissance — understanding where the gaps are and what might address them.

This avenue looks tractable. The failure analysis points toward a specific intervention: patient-facing phenotype capture that bypasses the workflow constraints killing clinician-facing tools. The Phenotype Passport concept has clear first steps and doesn't require institutional buy-in.

The contrast with hearing aids is instructive. Both are healthcare access problems. Both have technological overhang. But the binding constraint differs: hearing aids need distribution infrastructure and human touchpoints; rare disease diagnosis needs better input to algorithms that already exist. One is a logistics problem. One is an information problem. Software can help with information problems.

Next steps are to validate through reality checks — particularly talking to genetic counselors, parents who've completed diagnostic odysseys, and patient advocacy organizations. If the analysis holds up, the Phenotype Passport becomes a candidate for actual development.

The Zebra Scout source code is available at github.com/willworth/zebra-scout. The investigation below led to this prototype.

This is reconnaissance, not a promise. The diagnostic capability exists — databases, algorithms, genetic testing. The gap is getting it to patients at the moment it matters. If the analysis holds, patient-facing phenotype capture is where a small actor might find leverage. If you can prove it wrong, please do.

This is part of the Avenues of Investigation series—mapping technological overhangs where motivated individuals might find leverage.