OSGym: Scalable Distributed Data Engine for Generalizable Computer Agents

Zengyi Qin; Jinyuan Chen; Yunze Man; Shengcao Cao; Zhanqi Pang; Zhuoyuan Wang; Xin Sun; Gen Lin; Fang Han; Ling Zhu; Z. P. Xie; Zenghui Wei; Tianshu Ran; Haoran Geng; Xizheng Wu; Zachary Bright; Q. N. Sun; Rui Wang; Yuyang Cai; Song Wang; Jingwei Zhao; Han Cao; Y. Z. Zhou; Tianrui Liu; Renfu Pan; Chang Yang; Xiang Ren; Jie Zhang; Yaxin Ban; Jitendra Malik; Brian Anthony

doi:10.48550/arxiv.2511.11672

Back

OSGym: Scalable Distributed Data Engine for Generalizable Computer Agents

Preprint 2025

Authors

ZQ
Zengyi Qin
JC
Jinyuan Chen
YM
Yunze Man

Abstract

1 min read

We introduce OSGym, a scalable distributed Data Engine for training agents across diverse computer use tasks. OSGym efficiently scales to more than a thousand operating system (OS) replicas under academia-affordable cost budget, to serve as agent runtime environments. OSGym has three advantages: 1) Scalability: Despite intensive resource consumption for running OS replicas, OSGym can parallelize a thousand OS replicas while maintaining the operation efficiency under limited resources. Its scalable parallelization enables generating a vast amount of data (1420 multi-turn trajectories per minute). 2) Generality and Customizability: OSGym supports a wide variety of tasks as long as they run on operating systems, including functional tool-use, browser interactions, software engineering, office applications, etc. It also enables easy and flexible customization of model training algorithms. 3) Economic Viability for Academia Use: Only costs 0.2 to 0.3 USD per day per OS replica on easily accessible on-demand compute providers. Our experiments demonstrate the effectiveness of OSGym for implementing comprehensive pipelines for data collection, supervised fine-tuning, and reinforcement learning for computer use agents. We believe OSGym will push the scalability and universality in future agent research.

Discussion(0)

No comments yet. Be the first to comment.

Open reviews(0)

Public, signed peer feedback on this preprint.

No reviews yet.

Related publications

Article2024

OSGym: Scalable Distributed Data Engine for Generalizable Computer Agents

Abstract

Discussion(0)

Open reviews(0)

Related publications

A review of scalable and privacy-preserving multi-agent frameworks for distributed energy resources

Efficient evolutionary curriculum learning for scalable multi-agent reinforcement learning

A Review of Scalable and Privacy-Preserving Multi-Agent Frameworks for Distributed Energy Resources

IMPACT: Importance Weighted Asynchronous Architectures with Clipped\n Target Networks

SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent