SP
BravenNow
An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
| USA | technology | ✓ Verified - arxiv.org

An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models

#Rare Disease Phenotyping #Large Language Models #RARE-PHENIX #Medical AI #Clinical Notes #Human Phenotype Ontology #Undiagnosed Diseases Network #Vanderbilt University Medical Center

📌 Key Takeaways

  • RARE-PHENIX is an end-to-end AI framework for rare disease phenotyping using large language models
  • The system outperformed existing methods in clinical note processing and phenotype extraction
  • It integrates extraction, standardization, and prioritization of phenotypes into a complete workflow
  • The AI was trained on data from multiple clinical sites and externally validated
  • This technology has potential to accelerate rare disease diagnosis in real-world settings

📖 Full Retelling

A team of researchers led by Cathy Shyr and 10 collaborators has developed RARE-PHENIX, an end-to-end artificial intelligence framework for rare disease phenotyping from clinical notes using large language models, as announced in their February 23, 2026 arXiv submission. The research addresses the challenge of labor-intensive manual curation of structured phenotypes from clinical notes, which has proven difficult to scale in medical diagnostics. The team trained their AI system using data from 2,671 patients across 11 Undiagnosed Diseases Network clinical sites and validated it with 16,357 real-world clinical notes from Vanderbilt University Medical Center. The RARE-PHENIX framework represents a significant advancement in rare disease diagnosis by integrating three key components: large language model-based phenotype extraction, ontology-grounded standardization to Human Phenotype Ontology (HPO) terms, and supervised ranking of diagnostically informative phenotypes. Unlike existing AI approaches that typically optimize individual components of phenotyping, RARE-PHENIX operationalizes the full clinical workflow, providing structured, ranked phenotypes that are more concordant with clinician curation. The system demonstrated superior performance compared to state-of-the-art deep learning baselines across multiple evaluation metrics, achieving an ontology-based similarity score of 0.70 versus 0.58 for the baseline method. The researchers conducted ablation analyses that demonstrated performance improvements with the addition of each module in RARE-PHENIX—extraction, standardization, and prioritization—supporting the value of modeling the complete clinical phenotyping workflow rather than treating it as a single extraction task. This comprehensive approach has the potential to support human-in-the-loop rare disease diagnosis in real-world medical settings, potentially accelerating diagnosis times for patients with rare conditions who often endure lengthy diagnostic odysseys. By leveraging large language models to process unstructured clinical text and convert it into standardized, prioritized phenotypic information, RARE-PHENIX bridges a critical gap in medical AI applications.

🏷️ Themes

Artificial Intelligence, Rare Disease Diagnosis, Medical Technology, Clinical Workflow

📚 Related People & Topics

Human Phenotype Ontology

The Human Phenotype Ontology (HPO) is a formal ontology of human phenotypes. Developed as part of the Monarch Initiative in collaboration with members of the Open Biomedical Ontologies Foundry, HPO currently contains over 13,000 terms and over 156,000 annotations to hereditary diseases. Data from O...

View Profile → Wikipedia ↗

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Mentioned Entities

Human Phenotype Ontology

The Human Phenotype Ontology (HPO) is a formal ontology of human phenotypes. Developed as part of t

Large language model

Type of machine learning model

}
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.20324 [Submitted on 23 Feb 2026] Title: An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models Authors: Cathy Shyr , Yan Hu , Rory J. Tinker , Thomas A. Cassini , Kevin W. Byram , Rizwan Hamid , Daniel V. Fabbri , Adam Wright , Josh F. Peterson , Lisa Bastarache , Hua Xu View a PDF of the paper titled An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models, by Cathy Shyr and 10 other authors View PDF HTML Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of phenotyping but do not operationalize the full clinical workflow of extracting features from clinical text, standardizing them to Human Phenotype Ontology terms, and prioritizing diagnostically informative HPO terms. We developed RARE-PHENIX, an end-to-end AI framework for rare disease phenotyping that integrates large language model-based phenotype extraction, ontology-grounded standardization to HPO terms, and supervised ranking of diagnostically informative phenotypes. We trained RARE-PHENIX using data from 2,671 patients across 11 Undiagnosed Diseases Network clinical sites, and externally validated it on 16,357 real-world clinical notes from Vanderbilt University Medical Center. Using clinician-curated HPO terms as the gold standard, RARE-PHENIX consistently outperformed a state-of-the-art deep learning baseline across ontology-based similarity and precision-recall-F1 metrics in end-to-end evaluation (i.e., ontology-based similarity of 0.70 vs. 0.58). Ablation analyses demonstrated performance improvements with the addition of each module in RARE-PHENIX (extraction, standardization, and prioritization), supporti...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine