IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation
#data augmentation #optimal sample size #information theory #industrial scenarios #model performance
📌 Key Takeaways
- Industrial data augmentation lacks a theoretical basis for determining optimal sample size.
- No established metric exists to evaluate the accuracy or deviation of an estimated OSS from the ground truth.
- Authors propose an information-theoretic framework to estimate OSS.
- The approach targets improving model performance while addressing practical limitations in industrial scenarios.
📖 Full Retelling
The paper appears in the arXiv repository as arXiv:2602.15878v1, published in February 2026. It addresses researchers and practitioners working with industrial machine learning systems who rely on data augmentation to boost model performance. The authors point out that, despite its practical benefits, there is no theoretical framework or established method for determining the optimal sample size (OSS) for augmentation, nor is there a metric for assessing how close a chosen OSS is to the true optimal value. In response, they propose an information-theoretic approach aimed at estimating OSS and evaluating its accuracy within industrial contexts.
🏷️ Themes
Data Augmentation, Optimal Sample Size, Information Theory, Industrial Machine Learning
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
arXiv:2602.15878v1 Announce Type: cross
Abstract: In industrial scenarios, data augmentation is an effective approach to improve model performance. However, its benefits are not unidirectionally beneficial. There is no theoretical research or established estimation for the optimal sample size (OSS) in augmentation, nor is there an established metric to evaluate the accuracy of OSS or its deviation from the ground truth. To address these issues, we propose an information-theoretic optimal sample
Read full article at source