Skip to content

modelscope/RM-Gallery

Repository files navigation

English | δΈ­ζ–‡

RM-Gallery: A One-Stop Reward Model Platform


πŸ—‚οΈ Table of Contents


πŸ“’ News

  • [2025-07-09] We release RM Gallery v0.1.0 now, which is also available in PyPI!

🌟 Why RM-Gallery?

RM-Gallery is a one-stop platform for training, building and applying reward models. It provides a comprehensive solution for implementing reward models at both task-level and atomic-level, with high-throughput and fault-tolerant capabilities.

Framework
RM-Gallery Framework

πŸ‹οΈβ€β™‚οΈ Training RM

  • Integrated RM Training Pipeline: Provides an RL-based framework for training reasoning reward models, compatible with popular frameworks (e.g., verl), and offers examples for integrating RM-Gallery into the framework.

Training RM Accuracy Curve
RM Training Pipeline improves accuracy on RM Bench

This image demonstrates the effectiveness of the RM Training Pipeline. On RM Bench, after more than 80 training steps, the accuracy improved from around 55.8% with the baseline model (Qwen2.5-14B) to approximately 62.5%.

πŸ—οΈ Building RM

  • Unified Reward Model Architecture: Flexible implementation of reward models through standardized interfaces, supporting various architectures (model-based/free), reward formats (scalar/critique), and scoring patterns (pointwise/listwise/pairwise)

  • Comprehensive RM Gallery: Provides a rich collection of ready-to-use Reward Model instances for diverse tasks (e.g., math, coding, preference alignment) with both task-level(RMComposition) and component-level(RewardModel). Users can directly apply RMComposition/RewardModel for specific tasks or assemble custom RMComposition via component-level RewardModel.

  • Rubric-Critic-Score Paradigm: Adopts the Rubric+Critic+Score-based reasoning Reward Model paradigm, offering best practices to help users generate rubrics with limited preference data.

The two images above show that after applying the Rubric+Critic+Score paradigm and adding 1–3 rubrics to the base model (Qwen3-32B), there were significant improvements on both RewardBench2 and RMB-pairwise.

πŸ› οΈ Applying RM

  • Multiple Usage Scenarios: Covers multiple Reward Model (RM) usage scenarios with detailed best practices, including Training with Rewards (e.g., post-training), Inference with Rewards (e.g., Best-of-N,data-correction)

  • High-Performance RM Serving: Leverages the New API platform to deliver high-throughput, fault-tolerant reward model serving, enhancing feedback efficiency.

πŸ“₯ Installation

RM Gallery requires Python >= 3.10 and < 3.13

πŸ“¦ Install From source

# Pull the source code from GitHub
git clone https://github.com/modelscope/RM-Gallery.git

# Install the package
pip install .

Install From PyPi

pip install rm-gallery

πŸš€ Quick Start

Your First Reward Model

from rm_gallery.core.reward.registry import RewardRegistry

# 1. Choose a pre-built reward model
rm = RewardRegistry.get("safety_listwise_reward")

# 2. Prepare your data
from rm_gallery.core.data.schema import DataSample
sample = DataSample(...)  # See docs for details

# 3. Evaluate
result = rm.evaluate(sample)
print(result)

That's it! πŸŽ‰

πŸ‘‰ 5-Minute Quickstart Guide - Get started in minutes

πŸ‘‰ Interactive Notebooks - Try it hands-on

πŸ“– Key Features

πŸ—οΈ Building Reward Models

Choose from 35+ pre-built reward models or create your own:

# Use pre-built models
rm = RewardRegistry.get("math_correctness_reward")
rm = RewardRegistry.get("code_quality_reward")
rm = RewardRegistry.get("helpfulness_listwise_reward")

# Or build custom models
class CustomReward(BasePointWiseReward):
    def _evaluate(self, sample, **kwargs):
        # Your custom logic here
        return RewardResult(...)

πŸ“š See all available reward models β†’

πŸ‹οΈβ€β™‚οΈ Training Reward Models

Train your own reward models with VERL framework:

# Prepare data and launch training
cd examples/train/pointwise
./run_pointwise.sh

πŸ“š Training guide β†’

πŸ§ͺ Evaluating on Benchmarks

Test your models on standard benchmarks:

  • RewardBench2 - Latest reward model benchmark
  • RM-Bench - Comprehensive evaluation
  • Conflict Detector - Detect evaluation conflicts
  • JudgeBench - Judge capability evaluation

πŸ“š Evaluation guide β†’

πŸ› οΈ Real-World Applications

  • Best-of-N Selection - Choose the best from multiple responses
  • Data Refinement - Improve data quality with reward feedback
  • Post Training (RLHF) - Integrate with reinforcement learning
  • High-Performance Serving - Deploy as scalable service

πŸ“š Application guides β†’

πŸ“š Documentation

πŸ“– Complete Documentation - Full documentation site

Quick Links

🀝 Contribute

Contributions are always encouraged!

We highly recommend install pre-commit hooks in this repo before committing pull requests. These hooks are small house-keeping scripts executed every time you make a git commit, which will take care of the formatting and linting automatically.

pip install -e .
pre-commit install

Please refer to our Contribution Guide for more details.

πŸ“ Citation

Reference to cite if you use RM-Gallery in a paper:

@software{
title = {RM-Gallery: A One-Stop Reward Model Platform},
author = {The RM-Gallery Team},
url = {https://github.com/modelscope/RM-Gallery},
month = {07},
year = {2025}
}

Releases

No releases published

Packages

No packages published

Contributors 7

Languages