SP
BravenNow
SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
| USA | technology | ✓ Verified - arxiv.org

SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation

#SC-Arena #Large Language Models #Single-cell biology #Evaluation benchmark #Knowledge-augmented evaluation #Virtual cell #Natural language tasks #Biological reasoning

📌 Key Takeaways

  • SC-Arena introduces a unified evaluation framework for LLMs in single-cell biology
  • The framework uses a 'virtual cell' abstraction to represent cellular attributes and interactions
  • Five natural language tasks probe core reasoning capabilities in cellular biology
  • Knowledge-augmented evaluation incorporates external biological knowledge for more accurate assessment

📖 Full Retelling

Researchers led by Jiahao Zhao and seven collaborators introduced SC-Arena on February 26, 2026, a comprehensive evaluation framework designed to assess large language models in single-cell biology, addressing significant gaps in current assessment methods that suffer from fragmented benchmarks, unrealistic formats, and non-interpretable metrics. The SC-Arena framework introduces a novel 'virtual cell' abstraction that unifies evaluation targets by representing both intrinsic attributes and gene-level interactions within cells. This approach formalizes five distinct natural language tasks that probe core reasoning capabilities in cellular biology: cell type annotation, captioning, generation, perturbation prediction, and scientific question answering. Unlike existing benchmarks that often rely on simplified multiple-choice formats, SC-Arena's tasks more closely mirror real-world scientific usage of language models in biological research. A key innovation of SC-Arena is its knowledge-augmented evaluation methodology, which overcomes limitations of conventional string-matching metrics by incorporating external ontologies, marker databases, and scientific literature. This approach ensures biologically faithful judgments while providing interpretable, evidence-based rationales for evaluations.

🏷️ Themes

Artificial Intelligence, Scientific Research, Evaluation Methods

📚 Related People & Topics

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗

Cellular model

A cellular model or virtual cell is a computational model of aspects of a biological cell, for the purposes of in silico research. Developing such models has been a task of systems biology and mathematical biology. It involves developing efficient algorithms, data structures, visualization and commu...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Large language model:

🌐 Educational technology 4 shared
🌐 Reinforcement learning 3 shared
🌐 Machine learning 2 shared
🌐 Artificial intelligence 2 shared
🌐 Benchmark 2 shared
View full profile
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.23199 [Submitted on 26 Feb 2026] Title: SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation Authors: Jiahao Zhao , Feng Jiang , Shaowei Qin , Zhonghui Zhang , Junhao Liu , Guibing Guo , Hamid Alinejad-Rokny , Min Yang View a PDF of the paper titled SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation, by Jiahao Zhao and 7 other authors View PDF HTML Abstract: Large language models are increasingly applied in scientific research, offering new capabilities for knowledge discovery and reasoning. In single-cell biology, however, evaluation practices for both general and specialized LLMs remain inadequate: existing benchmarks are fragmented across tasks, adopt formats such as multiple-choice classification that diverge from real-world usage, and rely on metrics lacking interpretability and biological grounding. We present SC-ARENA, a natural language evaluation framework tailored to single-cell foundation models. SC-ARENA formalizes a virtual cell abstraction that unifies evaluation targets by representing both intrinsic attributes and gene-level interactions. Within this paradigm, we define five natural language tasks (cell type annotation, captioning, generation, perturbation prediction, and scientific QA) that probe core reasoning capabilities in cellular biology. To overcome the limitations of brittle string-matching metrics, we introduce knowledge-augmented evaluation, which incorporates external ontologies, marker databases, and scientific literature to support biologically faithful and interpretable judgments. Experiments and analysis across both general-purpose and domain-specialized LLMs demonstrate that under the Virtual Cell unified evaluation paradigm, current models achieve uneven performance on biologically complex tasks, particularly those demanding mechanistic or causal understanding ii) our knowledge-a...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine