SP
BravenNow
Rethinking Representativeness and Diversity in Dynamic Data Selection
| USA | technology | βœ“ Verified - arxiv.org

Rethinking Representativeness and Diversity in Dynamic Data Selection

#representativeness #diversity #dynamic data #data selection #machine learning #inclusivity #data curation

πŸ“Œ Key Takeaways

  • The article discusses the need to reconsider how representativeness and diversity are approached in dynamic data selection processes.
  • It highlights challenges in maintaining data quality and inclusivity as datasets evolve over time.
  • The piece suggests new methodologies or frameworks to better balance representativeness and diversity in real-time data curation.
  • It emphasizes the importance of these factors for improving outcomes in applications like machine learning and decision-making.

πŸ“– Full Retelling

arXiv:2603.04981v1 Announce Type: new Abstract: Dynamic data selection accelerates training by sampling a changing subset of the dataset while preserving accuracy. We rethink two core notions underlying sample evaluation: representativeness and diversity. Instead of local geometric centrality, we define representativeness as coverage of dataset-level common or high-frequency feature factors. Instead of within-subset dispersion, we define diversity at the process level, requiring the selection t

🏷️ Themes

Data Science, Inclusivity

Entity Intersection Graph

No entity connections available yet for this article.

}
Original Source
--> Computer Science > Artificial Intelligence arXiv:2603.04981 [Submitted on 5 Mar 2026] Title: Rethinking Representativeness and Diversity in Dynamic Data Selection Authors: Yuzhe Zhou , Zhenglin Hua , Haiyun Guo , Yuheng Jia View a PDF of the paper titled Rethinking Representativeness and Diversity in Dynamic Data Selection, by Yuzhe Zhou and 3 other authors View PDF HTML Abstract: Dynamic data selection accelerates training by sampling a changing subset of the dataset while preserving accuracy. We rethink two core notions underlying sample evaluation: representativeness and diversity. Instead of local geometric centrality, we define representativeness as coverage of dataset-level common or high-frequency feature factors. Instead of within-subset dispersion, we define diversity at the process level, requiring the selection trajectory to gradually include complementary rare factors over training. Based on this view, we propose a dynamic selection framework with three components. First, we score representativeness in a plug-in feature space to prioritize samples covering frequent factors. We instantiate this with a sparse autoencoder trained on the target dataset, using sparse unit activations to summarize both individual samples and dataset-wide factor statistics. Second, we realize process-level diversity by combining rare-factor sampling with a Usage-Frequency Penalty that promotes sample rotation, provably discourages monopoly, and reduces gradient bias. Third, we couple the two-dimensional scoring with a smooth scheduler that transitions selection from core-pattern consolidation to rare-factor exploration, without extra gradients, influence estimates, or second-order computations on the training model. Extensive experiments on five benchmarks across vision and text tasks demonstrate improved accuracy-efficiency trade-offs across models. Our method matches or exceeds full-data accuracy with over 2x training acceleration. Code will be released. Subjects: Artific...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

πŸ‡¬πŸ‡§ United Kingdom

πŸ‡ΊπŸ‡¦ Ukraine