Sub‑bit compression pushes weight magnitudes below one bit per weight, making the sign bit the main obstacle to further storage reduction.
Across Transformers, CNNs, and MLPs, learned sign matrices are spectrally similar to random Rademacher matrices and resist low‑rank approximation.
Most weight signs remain unchanged from initialization; flips mainly occur through rare crossings near zero, implying sign randomness is largely inherited from the initial condition.
Sign lock‑in theory models this phenomenon as a stopping‑time problem under SGD noise, yielding a geometric tail for the number of effective sign flips.
A gap‑based initialization and an outbound‑drift regularizer dramatically lower the effective flip rate (approx. 10⁻³) with negligible impact on perplexity.
📖 Full Retelling
WHO: Akira Sakai and Yuma Ichikawa. WHAT: They examine why weight signs in neural networks persist during aggressive sub‑bit compression and introduce the sign lock‑in theory along with a gap‑based initialization and an outward‑drift regularizer. WHERE: The work was submitted to arXiv under the computer science – machine learning category (cs.LG). WHEN: 19 February 2026. WHY: To explain the bottleneck imposed by the sign bit in sub‑bit models and to propose lightweight methods that substantially reduce sign flips while only marginally affecting model performance.
🏷️ Themes
Model compression, Weight sign dynamics in deep learning, Statistical analysis of stochastic training, Optimization techniques for sub‑bit quantization, Theoretical foundations of neural network sparsity
Entity Intersection Graph
No entity connections available yet for this article.
Deep Analysis
Why It Matters
The paper shows that weight sign patterns in deep networks are largely inherited from initialization and rarely change, creating a bottleneck for sub-bit compression. Understanding this behavior enables new techniques that reduce sign flips and improve storage efficiency without hurting accuracy.
Context & Background
Sub-bit compression seeks to store weights below one bit per weight, making the sign bit a fixed-cost bottleneck.
Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and appear random.
The authors formalize sign lock-in theory and propose a gap-based initialization and outward-drift regularizer to reduce flips.
What Happens Next
Researchers will likely integrate the gap-based initialization and regularizer into mainstream training pipelines to achieve sub-bit compression with minimal accuracy loss. Further studies may explore hardware designs that exploit the predictable sign patterns for energy-efficient inference.
Frequently Asked Questions
What is sign lock-in?
Sign lock-in refers to the phenomenon where weight signs remain largely unchanged from their initial random values throughout training, limiting the effectiveness of sign-based compression.
How does the new regularizer reduce sign flips?
The lightweight outward-drift regularizer encourages weights to move away from the zero boundary, making it harder for stochastic gradient updates to cross the sign threshold and thus lowering the flip rate.
Original Source
--> Computer Science > Machine Learning arXiv:2602.17063 [Submitted on 19 Feb 2026] Title: Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression Authors: Akira Sakai , Yuma Ichikawa View a PDF of the paper titled Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression, by Akira Sakai and 1 other authors View PDF Abstract: Sub-bit model compression seeks storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. Rademacher baseline. Despite this apparent randomness, most weights retain their initialization signs; flips primarily occur via rare near-zero boundary crossings, suggesting that sign-pattern randomness is largely inherited from initialization. We formalize this behavior with sign lock-in theory, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition into a small neighborhood around zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a gap-based initialization and a lightweight outward-drift regularizer, reducing the effective flip rate to approximately $10^{-3}$ with only about a one-point increase in perplexity. Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2602.17063 [cs.LG] (or arXiv:2602.17063v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.17063 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Akira Sakai [ view email ] [v1] Thu, 19 Feb 2026 04:10:05 UTC (584 KB) Full-text links: Access Paper: View a PDF of the paper titled Sign Lock-In: Randomly Initialized Weight Signs Persist...