Online GPU Energy Optimization with Switching-Aware Bandits
#GPU #energy optimization #bandits #online learning #switching‑aware #HPC #energy consumption #heterogeneous computing #supercomputer #wearable devices #offline training
📌 Key Takeaways
- GPUs are the main contributors to power consumption in modern heterogeneous HPC systems
- Existing energy‑management techniques focus largely on CPUs and rely on offline or hybrid offline‑online training
- The study proposes a switching‑aware bandit algorithm for online GPU energy optimization
- The method dynamically adjusts GPU settings to balance energy savings and performance
- The research targets a wide spectrum of devices, from wearables to leadership‑class supercomputers
- The paper highlights the impracticality of offline training for real‑time energy control
📖 Full Retelling
The project, presented by a team of researchers and posted on arXiv (ID 2410.11855v2) in October 2024, investigates how to reduce energy consumption in heterogeneous high‑performance computing systems by addressing the predominately power‑consuming GPUs. Unlike past work that focuses on CPU power management and depends heavily on offline or hybrid offline‑online training, the study proposes an online learning framework—specifically a switching‑aware bandit algorithm—that dynamically adjusts GPU settings to lower energy draw while maintaining performance. The authors argue that because GPUs now dominate power budgets in modern supercomputers and wearable devices, this online, adaptive approach is critical for the next generation of energy‑efficient systems.
🏷️ Themes
Energy Efficiency, GPU Computing, Bandit Algorithms, Online Learning, High‑Performance Computing, Heterogeneous Systems
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
arXiv:2410.11855v2 Announce Type: replace-cross
Abstract: Energy consumption has become a bottleneck for future computing architectures, from wearable devices to leadership-class supercomputers. Existing energy management techniques largely target CPUs, even though GPUs now dominate power draw in heterogeneous high performance computing (HPC) systems. Moreover, many prior methods rely on either purely offline or hybrid offline and online training, which is impractical and results in energy inef
Read full article at source