2/20/2026 | USA | technology | ✓ Verified - arxiv.org

A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data

#CACTUS #feature stability #explainable AI #missing data #haematuria #bladder cancer #biomedical machine learning #Random Forest #Gradient Boosting #clinical decision support #interpretability

📌 Key Takeaways

CACTUS integrates feature abstraction, interpretable classification, and systematic feature‑stability analysis.
Benchmarking against Random Forests and Gradient Boosting showed CACTUS matched or exceeded predictive performance while keeping top features more stable under increasing missingness.
Stability analysis was performed across whole cohort and sex‑stratified subgroups.
The framework highlights that feature stability is a complementary metric to conventional accuracy, essential for trustworthy biomedical models.
The study uses a clinically relevant dataset of 568 haematuria patients, demonstrating generalizability to small, heterogeneous, incomplete clinical datasets.

📖 Full Retelling

Justyna Andrys‑Olek, Paulina Tworek, Luca Gherardini, Mark W. Ruddock, Mary Jo Kurt, Peter Fitzgerald, and Jose Sousa introduced CACTUS (Comprehensive Abstraction and Classification Tool for Uncovering Structures), an explainable machine‑learning framework designed to address the lack of robustness, interpretability and feature stability in high‑stakes biomedical applications. The study was conducted using a real‑world haematuria cohort of 568 patients evaluated for bladder cancer, and the preprint was submitted to arXiv on 19 February 2026. The authors aim to provide a trustworthy decision‑support tool that remains reliable even when clinical data are incomplete, by explicitly quantifying how consistently informative features persist as missingness increases.

🏷️ Themes

Trustworthy machine learning in healthcare, Explainability and feature stability, Robustness to missing data, Clinical decision support, Evaluation of small, heterogeneous datasets

Entity Intersection Graph

No entity connections available yet for this article.

Deep Analysis

Why It Matters

CACTUS addresses a key barrier to clinical adoption of machine learning by ensuring that predictive models remain stable even when patient data is incomplete, which is common in real-world settings. By quantifying feature stability, the framework builds trust among clinicians and supports reproducible decision making.

Context & Background

Machine learning models often fail when data is missing
Feature instability undermines trust in high-stakes medical decisions
CACTUS integrates feature abstraction, interpretability, and stability analysis

What Happens Next

Researchers will likely extend CACTUS to other disease domains and larger datasets, testing its robustness across varied missingness patterns. The framework may also be integrated into clinical decision support systems, enabling real-time, trustworthy risk assessments.

Frequently Asked Questions

What makes CACTUS different from standard models?

It explicitly measures how stable the most important features are when data quality changes, which standard models do not provide.

Can CACTUS handle non-random missing data?

The current study used random missingness, but the stability analysis can be adapted to other missingness mechanisms with further validation.

Original Source

              --> Computer Science > Machine Learning arXiv:2602.17364 [Submitted on 19 Feb 2026] Title: A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data Authors: Justyna Andrys-Olek , Paulina Tworek , Luca Gherardini , Mark W. Ruddock , Mary Jo Kurt , Peter Fitzgerald , Jose Sousa View a PDF of the paper titled A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data, by Justyna Andrys-Olek and Paulina Tworek and Luca Gherardini and Mark W. Ruddock and Mary Jo Kurt and Peter Fitzgerald and Jose Sousa View PDF HTML Abstract: Machine learning models are increasingly applied to biomedical data, yet their adoption in high stakes domains remains limited by poor robustness, limited interpretability, and instability of learned features under realistic data perturbations, such as missingness. In particular, models that achieve high predictive performance may still fail to inspire trust if their key features fluctuate when data completeness changes, undermining reproducibility and downstream decision-making. Here, we present CACTUS (Comprehensive Abstraction and Classification Tool for Uncovering Structures), an explainable machine learning framework explicitly designed to address these challenges in small, heterogeneous, and incomplete clinical datasets. CACTUS integrates feature abstraction, interpretable classification, and systematic feature stability analysis to quantify how consistently informative features are preserved as data quality degrades. Using a real-world haematuria cohort comprising 568 patients evaluated for bladder cancer, we benchmark CACTUS against widely used machine learning approaches, including random forests and gradient boosting methods, under controlled levels of randomly introduced missing data. We demonstrate that CACTUS achieves competitive or superior predictive performance while maintaining markedly higher stability o...
            

Read full article at source

Source

arxiv.org