2/18/2026 | USA | technology | ✓ Verified - arxiv.org

This human study did not involve human subjects: Validating LLM simulations as behavioral evidence

#large language models #synthetic participants #causal inference #exploratory analysis #confirmatory analysis #simulation validity #social science experiments #cost‑effective research #instantaneous responses

📌 Key Takeaways

LLMs can function as synthetic participants in social science experiments
There is current ambiguity about when LLM simulations yield valid inferences about human behaviour
The study contrasts two causal‑estimation strategies for simulation data
It delineates the assumptions required for exploratory versus confirmatory analyses
The methods aim to provide a cost‑effective and near‑instantaneous data‑collection alternative to human subjects

📖 Full Retelling

Researchers in the social science and artificial‑intelligence communities have published a new study (arXiv:2602.15785v1) that evaluates the use of large language models (LLMs) as synthetic participants in experiments, outlines two methods for deriving causal estimates from such simulations, and outlines the assumptions needed to support exploratory versus confirmatory inference, thereby offering a cost‑effective and rapid alternative to recruiting human subjects as of February 2026.

🏷️ Themes

Synthetic sampling, AI‑augmented research design, Causal inference, Methodological guidance, Social science experimentation

Entity Intersection Graph

No entity connections available yet for this article.

Deep Analysis

Why It Matters

Large language models can replace costly human experiments, but only if their outputs reliably reflect real behavior. This study clarifies when LLM simulations can support causal inference, guiding researchers on valid use of synthetic data.

Context & Background

LLMs are increasingly used as synthetic participants in social science experiments
Current methods lack clear guidance on validity of causal estimates from LLM data
The study compares two strategies for obtaining valid causal effects from LLM simulations

What Happens Next

Researchers will adopt the recommended strategy to design experiments that yield trustworthy causal conclusions. Future work will test the approach across diverse behavioral domains and refine assumptions.

Frequently Asked Questions

What are the two strategies compared in the study?

The paper contrasts a strategy that uses LLM outputs as direct proxies for human responses with a strategy that calibrates LLM predictions against limited human data to adjust for bias.

When can LLM simulations be used for exploratory research?

They can be used for exploratory research when the goal is to generate hypotheses, provided the assumptions about the similarity between LLM and human behavior are acknowledged.

When are LLM simulations suitable for confirmatory research?

They are suitable for confirmatory research only after the assumptions have been validated and the causal estimates have been calibrated to match human data.

}

Original Source

              arXiv:2602.15785v1 Announce Type: new 
Abstract: A growing literature uses large language models (LLMs) as synthetic participants to generate cost-effective and nearly instantaneous responses in social science experiments. However, there is limited guidance on when such simulations support valid inference about human behavior. We contrast two strategies for obtaining valid estimates of causal effects and clarify the assumptions under which each is suitable for exploratory versus confirmatory res
            

Read full article at source

Source

arxiv.org