2/20/2026 | USA | technology | ✓ Verified - arxiv.org

In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

#in-context learning #linear attention #quadratic attention #transformer #linear regression #model depth #convergence #generalization #arXiv #cs.LG

📌 Key Takeaways

Three researchers—Ayush Goel, Arjun Kohli, and Sarvagya Somvanshi—authored the study.
The work compares linear attention models with quadratic (transformer) attention on linear‑regression tasks.
Evaluation metrics include learning quality, convergence speed, and generalization ability.
The authors analyze the impact of model depth on ICL performance.
Findings highlight both similarities and constraints of linear attention in relation to quadratic attention.

📖 Full Retelling

In a February 2026 arXiv submission, researchers Ayush Goel, Arjun Kohli, and Sarvagya Somvanshi present an empirical study titled "In‑Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks," exploring how linear and quadratic attention mechanisms differ in their in‑context learning (ICL) behavior on canonical linear‑regression benchmarks. The paper, submitted on 19 Feb 2026 to arXiv’s computer‑science/ML repository, examines learning quality, convergence, and generalization while also probing how increasing model depth affects ICL performance, thereby shedding light on the similarities and limitations of linear attention relative to quadratic attention in this setting.

🏷️ Themes

In‑Context Learning (ICL), Linear vs. Quadratic Attention, Regression Tasks, Model Depth, Machine Learning Research

Entity Intersection Graph

No entity connections available yet for this article.

Deep Analysis

Why It Matters

The study compares linear and quadratic attention models on in‑context learning for linear regression, revealing how model depth and attention type influence learning quality and generalization. These insights help researchers choose architectures for efficient few‑shot learning in practical applications.

Context & Background

In‑context learning enables models to adapt to new tasks without gradient updates
Linear attention offers computational efficiency compared to quadratic attention
Prior work showed transformers can learn linear regression in‑context
This paper evaluates convergence and generalization across model depths

What Happens Next

Future work may extend the comparison to non‑linear regression tasks and explore hybrid attention mechanisms. The findings could guide the design of lightweight models for deployment on edge devices.

Frequently Asked Questions

What is the main difference between linear and quadratic attention?

Linear attention reduces computational complexity to linear time, while quadratic attention computes full pairwise interactions.

How does model depth affect in‑context learning?

Increasing depth generally improves convergence but can also lead to diminishing returns for linear attention.

Can these results be applied to other domains?

Yes, the methodology can be adapted to other simple function classes, though performance may vary with task complexity.

Where can I find the code?

The paper references associated code repositories on platforms like Hugging Face and Papers with Code, but the exact links are provided in the supplementary materials.

}

Original Source

              --> Computer Science > Machine Learning arXiv:2602.17171 [Submitted on 19 Feb 2026] Title: In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks Authors: Ayush Goel , Arjun Kohli , Sarvagya Somvanshi View a PDF of the paper titled In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks, by Ayush Goel and 2 other authors View PDF HTML Abstract: Recent work has demonstrated that transformers and linear attention models can perform in-context learning on simple function classes, such as linear regression. In this paper, we empirically study how these two attention mechanisms differ in their ICL behavior on the canonical linear-regression task of Garg et al. We evaluate learning quality , convergence, and generalization behavior of each architecture. We also analyze how increasing model depth affects ICL performance. Our results illustrate both the similarities and limitations of linear attention relative to quadratic attention in this setting. Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.17171 [cs.LG] (or arXiv:2602.17171v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.17171 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Ayush Goel [ view email ] [v1] Thu, 19 Feb 2026 08:38:20 UTC (1,244 KB) Full-text links: Access Paper: View a PDF of the paper titled In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks, by Ayush Goel and 2 other authors View PDF HTML TeX Source view license Current browse context: cs.LG < prev | next > new | recent | 2026-02 Change to browse by: cs cs.AI References & Citations NASA ADS Google Scholar Semantic Scholar export BibTeX citation Loading... BibTeX formatted citation × loading... Data provided by: Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliog...
            

Read full article at source

Source

arxiv.org