3/24/2026 | USA | technology | ✓ Verified - arxiv.org

Weber's Law in Transformer Magnitude Representations: Efficient Coding, Representational Geometry, and Psychophysical Laws in Language Models

#Weber's Law #transformer models #efficient coding #representational geometry #psychophysics #language models #magnitude representation

📌 Key Takeaways

Weber's Law, a psychophysical principle, is observed in transformer-based language models' magnitude representations.
The study explores efficient coding mechanisms within transformer architectures for numerical data.
Representational geometry analysis reveals how models encode magnitude information in high-dimensional spaces.
Findings suggest language models develop internal representations that mirror human perceptual scaling laws.

📖 Full Retelling

arXiv:2603.20642v1 Announce Type: cross Abstract: How do transformer language models represent magnitude? Recent work disagrees: some find logarithmic spacing, others linear encoding, others per-digit circular representations. We apply the formal tools of psychophysics to resolve this. Using four converging paradigms (representational similarity analysis, behavioural discrimination, precision gradients, causal intervention) across three magnitude domains in three 7-9B instruction-tuned models s

🏷️ Themes

AI Psychology, Neural Networks

Entity Intersection Graph

No entity connections available yet for this article.

Deep Analysis

Why It Matters

This research matters because it bridges cognitive psychology and artificial intelligence, showing that language models develop human-like perceptual representations of magnitude. It affects AI researchers developing more human-aligned models, cognitive scientists studying perception, and developers creating AI systems that need to understand numerical concepts intuitively. The findings suggest that efficient coding principles in biological systems also emerge in artificial neural networks, potentially leading to more interpretable and psychologically plausible AI models.

Context & Background

Weber's Law is a fundamental psychophysical principle stating that the just-noticeable difference between two stimuli is proportional to their magnitude, first described by Ernst Heinrich Weber in the 19th century
Transformers have become the dominant architecture in natural language processing since the 2018 'Attention Is All You Need' paper, powering models like GPT and BERT
Efficient coding theory proposes that neural systems optimize information transmission given biological constraints, explaining many perceptual phenomena including Weber's Law
Previous research has shown neural networks can develop human-like number representations, but this specifically examines Weber's Law in transformer language models

What Happens Next

Researchers will likely investigate whether other psychophysical laws emerge in language models, potentially examining Stevens' Power Law or Fechner's Law. Further work may explore how these representations affect downstream tasks like mathematical reasoning or quantity estimation. The findings could inspire new model architectures that explicitly incorporate psychophysical principles, with publications expected in cognitive science and AI conferences within the next year.

Frequently Asked Questions

What is Weber's Law and why is it important?

Weber's Law describes how humans perceive differences between stimuli, where the noticeable difference increases with stimulus magnitude. It's important because it's one of the oldest quantitative laws in psychology and explains many perceptual phenomena across senses including vision, hearing, and touch.

How do transformers develop these magnitude representations?

Transformers likely develop these representations through their self-attention mechanisms and training on large text corpora. The models appear to learn efficient coding strategies similar to biological systems when processing numerical information embedded in language.

What does this mean for AI development?

This suggests AI systems can develop human-like perceptual representations without explicit programming. It could lead to more interpretable models and better integration of cognitive science principles into AI design, potentially improving how AI handles numerical reasoning tasks.

Are there practical applications of this research?

Yes, applications include developing AI systems with more intuitive understanding of quantities for education, data visualization, or scientific applications. It could also improve how AI models handle tasks requiring magnitude estimation or proportional reasoning in real-world scenarios.

How does this relate to efficient coding theory?

Efficient coding theory explains how biological systems optimize information processing. This research shows transformers develop similar optimization strategies, suggesting common principles govern information representation in both biological and artificial neural systems.

}

Original Source

              arXiv:2603.20642v1 Announce Type: cross 
Abstract: How do transformer language models represent magnitude? Recent work disagrees: some find logarithmic spacing, others linear encoding, others per-digit circular representations. We apply the formal tools of psychophysics to resolve this. Using four converging paradigms (representational similarity analysis, behavioural discrimination, precision gradients, causal intervention) across three magnitude domains in three 7-9B instruction-tuned models s
            

Read full article at source

Source

arxiv.org