ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

Read this in 中文.

This repository is the code repository for ComputerRL, which is based on modifications and simplifications of the OSWorld repository. ComputerRL focuses on end-to-end online reinforcement learning methods for training computer use agents. This project has been optimized and simplified based on the OSWorld benchmark environment, providing a more focused and efficient experimental platform for computer use research.

We introduce an API-GUI action paradigm that seamlessly integrates automatically constructed APIs with GUI actions to improve agent efficiency and effectiveness. A large-scale parallel desktop environment with 1,000+ real-world instances, combined with an asynchronous RL framework, enables efficient sampling and robust agent training.

The success rates of agents on OSWorld.

🌱 Environment

🖥️ Check for KVM Support

We recommend running the VM with KVM support for better performance. To check if your system supports KVM, run the following command:

egrep -c '(vmx|svm)' /proc/cpuinfo

If the output is greater than 0, your system supports KVM. ✅

🐳 Install Docker

Please refer to the Docker Installation Guide to install Docker on your machine.

📥 Download the Image

Download the official image from ubuntu_osworld.

🧪 Experiments

📦 Install Dependencies

pip install -r requirements.txt

📂 Download Experiment Files

All experiment files will be cached under ./cache. Obtain experiment files through the following methods:

OSWorld: You can refer to the OSWorld Official to download all the cached files.
OfficeWorld: Download the experiment files from ModelScope/OfficeWorld-Cache and extract them into the ./cache directory.

🤖 Deploy the Model

Two types of models are now available for open-source use. You can download the models and use them by setting the parameter --model:

Text-Only: ModelScope/ComputerRL
Multimodal: ModelScope/ComputerRL-V

pip install "sglang[all]"  # if not installed

python -m sglang.launch_server \
  --model zai-org/autoglm-os-9b \
  --host 0.0.0.0 --port 30000 --served-model-name autoglm-os

🚀 Run Experiments

Reproduce the results on OSWorld by running the following scripts.

🔐 Environment Variables

# Set up your API
export OPENAI_BASE_URL="https://api-gateway.glm.ai/v1"
export OPENAI_API_KEY="API-KEY"

🔄 Single-Process Test

# If using a multimodal model, please use run_autoglm_v.py
python run_autoglm.py \
    --provider_name docker \
    --path_to_vm Ubuntu/Ubuntu.vmx \
    --headless \
    --max_steps 15 \
    --test_all_meta_path ./evaluation_examples/test_nogdrive.json

⚡ Parallel Test

# If using a multimodal model, please use run_multienv_autoglm_v.py
python run_multienv_autoglm.py \ 
    --provider_name docker \
    --path_to_vm Ubuntu/Ubuntu.vmx \
    --headless \
    --num_workers 20 \
    --max_steps 15 \
    --test_all_meta_path ./evaluation_examples/test_nogdrive.json

📊 View Experiment Results

Result files are cached under ./results. Run the following script to view the scores:

python show_result.py

🧹 Clean Up Docker Images

After finishing your experiments, you may have leftover Docker images. Clean them up with:

docker stop $(docker ps -q) && docker rm $(docker ps -a -q)

🏢 OfficeWorld Benchmark

The OfficeWorld benchmark is built upon SpreadsheetBench, PPTC, and in-house developed Writer domain tasks.
Tasks are adapted as needed to fit into the OSWorld framework, enabling systematic evaluation of agent capabilities in office-oriented scenarios.

▶️ Running the OfficeWorld Benchmark

Run the following command to evaluate your agent on the OfficeWorld benchmark:

python run_multienv_autoglm.py \
    --provider_name docker \
    --path_to_vm Ubuntu/Ubuntu.vmx \
    --headless \
    --num_workers 20 \
    --max_steps 15 \
    --test_all_meta_path ./evaluation_examples/test_office.json

🏆 Leaderboard

Check out the leaderboard here! 🚀

If you would like to add your results to the leaderboard, please email hanyullai@outlook.com.

📄 Citation

@misc{lai2025computerrl,
    title={ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents}, 
    author={Hanyu Lai and Xiao Liu and Yanxiao Zhao and Han Xu and Hanchen Zhang and Bohao Jing and Yanyu Ren and Shuntian Yao and Yuxiao Dong and Jie Tang},
    year={2025},
    eprint={2508.14040},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2508.14040}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
desktop_env		desktop_env
evaluation_examples		evaluation_examples
mm_agents		mm_agents
.gitignore		.gitignore
PROXY_GUIDELINE.md		PROXY_GUIDELINE.md
README.md		README.md
README_zh.md		README_zh.md
lib_run_single.py		lib_run_single.py
requirements.txt		requirements.txt
run_autoglm.py		run_autoglm.py
run_autoglm_v.py		run_autoglm_v.py
run_multienv_autoglm.py		run_multienv_autoglm.py
run_multienv_autoglm_v.py		run_multienv_autoglm_v.py
show_result.py		show_result.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

🌱 Environment

🖥️ Check for KVM Support

🐳 Install Docker

📥 Download the Image

🧪 Experiments

📦 Install Dependencies

📂 Download Experiment Files

🤖 Deploy the Model

🚀 Run Experiments

🔐 Environment Variables

🔄 Single-Process Test

⚡ Parallel Test

📊 View Experiment Results

🧹 Clean Up Docker Images

🏢 OfficeWorld Benchmark

▶️ Running the OfficeWorld Benchmark

🏆 Leaderboard

📄 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

THUDM/ComputerRL

Folders and files

Latest commit

History

Repository files navigation

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

🌱 Environment

🖥️ Check for KVM Support

🐳 Install Docker

📥 Download the Image

🧪 Experiments

📦 Install Dependencies

📂 Download Experiment Files

🤖 Deploy the Model

🚀 Run Experiments

🔐 Environment Variables

🔄 Single-Process Test

⚡ Parallel Test

📊 View Experiment Results

🧹 Clean Up Docker Images

🏢 OfficeWorld Benchmark

▶️ Running the OfficeWorld Benchmark

🏆 Leaderboard

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages