SP
BravenNow
Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts
| USA | technology | βœ“ Verified - arxiv.org

Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts

#GPT-5 #Large Language Models #Citation Context Analysis #Prompt Sensitivity #Interpretative Analysis #Academic Research #ArXiv #Text Grounded Readings

πŸ“Œ Key Takeaways

  • GPT-5 shows promise for interpretative citation context analysis through 'thick' readings rather than typological labels
  • Prompt scaffolding and framing significantly influence the model's interpretative outputs
  • The study identified 21 recurring interpretative moves in GPT-5's reconstructions
  • GPT-5 consistently classified citations as 'supplementary' but showed varied interpretative approaches compared to human analysis

πŸ“– Full Retelling

Arno Simons published a research paper on arXiv on February 25, 2026, that tests whether large language models like GPT-5 can support interpretative citation context analysis through 'thick' text-grounded readings of a single case rather than relying on typological labels. The study investigates how prompt-sensitivity affects the model's interpretative capabilities by varying prompt scaffolding and framing in a balanced 2x3 design. Using footnote 6 from Chubin and Moitra's 1975 work and Gilbert's 1977 reconstruction as a probe case, Simons implemented a two-stage GPT-5 pipeline to examine citation classification and cross-document interpretative reconstruction. The research methodology involved implementing a two-stage GPT-5 pipeline: first, a citation-text-only surface classification and expectation pass, followed by cross-document interpretative reconstruction using both citing and cited full texts. Across 90 reconstructions, the model produced 450 distinct hypotheses. Through close reading and inductive coding, Simons identified 21 recurring interpretative moves and used linear probability models to estimate how prompt choices affected their frequencies and lexical repertoire. The study revealed that GPT-5's surface pass was highly stable, consistently classifying the citation as 'supplementary.' However, during the reconstruction phase, the model generated a structured space of plausible alternatives, with scaffolding and examples redistributing attention and vocabulary, sometimes leading to strained readings. When compared to Gilbert's analysis, GPT-5 detected the same textual hinges but more often resolved them as lineage and positioning rather than as admonishment. Simons concludes by outlining both opportunities and risks of using LLMs as guided co-analysts for inspectable, contestable interpretative citation context analysis, demonstrating that prompt engineering significantly influences which interpretations the model produces.

🏷️ Themes

Artificial Intelligence, Academic Research, Prompt Engineering, Citation Analysis

πŸ“š Related People & Topics

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile β†’ Wikipedia β†—

Entity Intersection Graph

Connections for Large language model:

🌐 Educational technology 4 shared
🌐 Reinforcement learning 3 shared
🌐 Machine learning 2 shared
🌐 Artificial intelligence 2 shared
🌐 Benchmark 2 shared
View full profile
Original Source
--> Computer Science > Computation and Language arXiv:2602.22359 [Submitted on 25 Feb 2026] Title: Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts Authors: Arno Simons View a PDF of the paper titled Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts, by Arno Simons View PDF Abstract: This paper tests whether large language models can support interpretative citation context analysis by scaling in thick, text-grounded readings of a single hard case rather than scaling up typological labels. It foregrounds prompt-sensitivity analysis as a methodological issue by varying prompt scaffolding and framing in a balanced 2x3 design. Using footnote 6 in Chubin and Moitra (1975) and Gilbert's (1977) reconstruction as a probe, I implement a two-stage GPT-5 pipeline: a citation-text-only surface classification and expectation pass, followed by cross-document interpretative reconstruction using the citing and cited full texts. Across 90 reconstructions, the model produces 450 distinct hypotheses. Close reading and inductive coding identify 21 recurring interpretative moves, and linear probability models estimate how prompt choices shift their frequencies and lexical repertoire. GPT-5's surface pass is highly stable, consistently classifying the citation as "supplementary". In reconstruction, the model generates a structured space of plausible alternatives, but scaffolding and examples redistribute attention and vocabulary, sometimes toward strained readings. Relative to Gilbert, GPT-5 detects the same textual hinges yet more often resolves them as lineage and positioning than as admonishment. The study outlines opportunities and risks of using LLMs as guided co-analysts for inspectable, contestable interpretative CCA, and it shows that prompt scaffolding and framing systematically tilt which plausible readings and vocabularies the model foregrounds. Comments: 26 pages, 1 figure, 3 tables (plus 17 pa...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

πŸ‡¬πŸ‡§ United Kingdom

πŸ‡ΊπŸ‡¦ Ukraine