Not All Layers Need Tuning: Selective Layer Restoration Recovers Diversity
#LLM #Mode Collapse #Selective Layer Restoration #Post-training #Neural Networks #Generation Diversity #arXiv
📌 Key Takeaways
- Post-training processes like SFT and RLHF improve instruction-following but cause 'mode collapse.'
- Mode collapse results in repetitive and less diverse AI-generated outputs in open-ended tasks.
- The research suggests that diversity loss is localized to specific layers within a Large Language Model.
- Restoring selected layers to their pre-trained weights can recover output diversity without losing helpfulness.
📖 Full Retelling
🏷️ Themes
Artificial Intelligence, Machine Learning, Technology
📚 Related People & Topics
Neural network
Structure in biology and artificial intelligence
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or mathematical models. While individual neurons are simple, many of them together in a network can perform complex tasks.
Large language model
Type of machine learning model
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...
🔗 Entity Intersection Graph
Connections for Neural network:
- 🌐 Deep learning (4 shared articles)
- 🌐 Reinforcement learning (2 shared articles)
- 🌐 Machine learning (2 shared articles)
- 🌐 Censorship (1 shared articles)
- 🌐 CSI (1 shared articles)
- 🌐 Mechanistic interpretability (1 shared articles)
- 🌐 Batch normalization (1 shared articles)
- 🌐 PPO (1 shared articles)
- 🌐 Global workspace theory (1 shared articles)
- 🌐 Cognitive neuroscience (1 shared articles)
- 🌐 Robustness (1 shared articles)
- 🌐 Homeostasis (1 shared articles)
📄 Original Source Content
arXiv:2602.06665v1 Announce Type: cross Abstract: Post-training improves instruction-following and helpfulness of large language models (LLMs) but often reduces generation diversity, which leads to repetitive outputs in open-ended settings, a phenomenon known as mode collapse. Motivated by evidence that LLM layers play distinct functional roles, we hypothesize that mode collapse can be localized to specific layers and that restoring a carefully chosen range of layers to their pre-trained weight