Towards Efficient and Practical GPU Multitasking in the Era of LLM
Preprint 2025 en
Authors
JX
Jiarong Xing
YQ
Yifan Qiao
SM
Simon Mo
Abstract
1 min read
GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet the demands of modern AI workloads. In this work, we highlight the key requirements for GPU multitasking, examine prior efforts, and discuss why they fall short. To advance toward efficient and practical GPU multitasking, we envision a resource management layer, analogous to a CPU operating system, to handle various aspects of GPU resource management and sharing. We outline the challenges and potential solutions, and hope this paper inspires broader community efforts to build the next-generation GPU compute paradigm grounded in multitasking.
Frank Sifei Luan, Ziming Mao, R. Wang, Chi‐Wei Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, Stephanie Wang
Discussion(0)
No comments yet. Be the first to comment.