📃 Paper | 📊 OfficeWorld Leaderboard
Read this in 中文.
This repository is the code repository for ComputerRL, which is based on modifications and simplifications of the OSWorld repository. ComputerRL focuses on end-to-end online reinforcement learning methods for training computer use agents. This project has been optimized and simplified based on the OSWorld benchmark environment, providing a more focused and efficient experimental platform for computer use research.

We introduce an API-GUI action paradigm that seamlessly integrates automatically constructed APIs with GUI actions to improve agent efficiency and effectiveness. A large-scale parallel desktop environment with 1,000+ real-world instances, combined with an asynchronous RL framework, enables efficient sampling and robust agent training.

The success rates of agents on OSWorld.
We recommend running the VM with KVM support for better performance. To check if your system supports KVM, run the following command:
egrep -c '(vmx|svm)' /proc/cpuinfoIf the output is greater than 0, your system supports KVM. ✅
Please refer to the Docker Installation Guide to install Docker on your machine.
Download the official image from ubuntu_osworld.
pip install -r requirements.txtAll experiment files will be cached under ./cache. Obtain experiment files through the following methods:
- OSWorld: You can refer to the OSWorld Official to download all the cached files.
- OfficeWorld: Download the experiment files from ModelScope/OfficeWorld-Cache and extract them into the
./cachedirectory.
Two types of models are now available for open-source use. You can download the models and use them by setting the parameter --model:
- Text-Only: ModelScope/ComputerRL
- Multimodal: ModelScope/ComputerRL-V
pip install "sglang[all]" # if not installed
python -m sglang.launch_server \
--model zai-org/autoglm-os-9b \
--host 0.0.0.0 --port 30000 --served-model-name autoglm-osReproduce the results on OSWorld by running the following scripts.
# Set up your API
export OPENAI_BASE_URL="https://api-gateway.glm.ai/v1"
export OPENAI_API_KEY="API-KEY"# If using a multimodal model, please use run_autoglm_v.py
python run_autoglm.py \
--provider_name docker \
--path_to_vm Ubuntu/Ubuntu.vmx \
--headless \
--max_steps 15 \
--test_all_meta_path ./evaluation_examples/test_nogdrive.json# If using a multimodal model, please use run_multienv_autoglm_v.py
python run_multienv_autoglm.py \
--provider_name docker \
--path_to_vm Ubuntu/Ubuntu.vmx \
--headless \
--num_workers 20 \
--max_steps 15 \
--test_all_meta_path ./evaluation_examples/test_nogdrive.jsonResult files are cached under ./results. Run the following script to view the scores:
python show_result.pyAfter finishing your experiments, you may have leftover Docker images. Clean them up with:
docker stop $(docker ps -q) && docker rm $(docker ps -a -q)The OfficeWorld benchmark is built upon SpreadsheetBench, PPTC, and in-house developed Writer domain tasks.
Tasks are adapted as needed to fit into the OSWorld framework, enabling systematic evaluation of agent capabilities in office-oriented scenarios.
Run the following command to evaluate your agent on the OfficeWorld benchmark:
python run_multienv_autoglm.py \
--provider_name docker \
--path_to_vm Ubuntu/Ubuntu.vmx \
--headless \
--num_workers 20 \
--max_steps 15 \
--test_all_meta_path ./evaluation_examples/test_office.jsonCheck out the leaderboard here! 🚀
If you would like to add your results to the leaderboard, please email hanyullai@outlook.com.
@misc{lai2025computerrl,
title={ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents},
author={Hanyu Lai and Xiao Liu and Yanxiao Zhao and Han Xu and Hanchen Zhang and Bohao Jing and Yanyu Ren and Shuntian Yao and Yuxiao Dong and Jie Tang},
year={2025},
eprint={2508.14040},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.14040},
}