TruthRL is a simple yet effective truthfulness-driven reinforcement learning (RL) method that significantly reduces hallucinations in large language models (LLMs) by enabling proper abstention while preserving accuracy.
Factual accuracy alone does NOT necessarily guarantee truthfulness!
A model that answers fewer questions correctly while reliably abstaining when uncertain is far more trustworthy than a higher-accuracy model that frequently fabricates plausible but incorrect answers.
In vanilla supervised fine-tuning (SFT) or RL, the model is optimized solely for accuracy, implicitly rewarding hallucinations over abstentions and thus always attempting to answer or guess, which ultimately compromises truthfulness. In contrast, TruthRL not only rewards correct answers, but explicitly penalizes hallucinations, and treats abstentions neutrally, thereby leading to greater truthfulness.
Run the following script to create a Python virtual environment for TruthRL training.
conda create -n truthrl-verl python=3.10 -y
conda activate truthrl-verl
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
cd training/verl
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh
pip install numpy==1.26.1 opentelemetry-sdk==1.26.0 opentelemetry-sdk==1.26.0 click==8.2.1 tensordict==0.8.1
pip install --no-deps -e .
huggingface-cli login
wandb loginThe training requires an LLM verifier to judge whether the predicted answer aligns with the reference answer and produce reward signals. If not hosting locally, please change OPENAI_API_BASE in train_grpo.sh to the base URL where you host the verifier model. By default, the training script is set for 8 x H100 80G GPUs. Please adjust N_GPUS  based on your compute resource.
conda activate truthrl-verl
bash train_grpo.shRun the following script to create a Python virtual environment for TruthRL training.
conda create -n truthrl-eval python=3.10 -y
conda activate truthrl-eval
cd evaluation
pip install -r requirements.txtUse the following script to evaluate the model. Note that the evaluation also requires a LLM to judge whether the predicted answer aligns with the reference answer. If not hosting locally, please change api_url in evaluate.py to the base URL where you host the verifier model.
conda activate truthrl-eval
python evaluate.pyIf you have any questions related to the code or the paper, feel free to email Zhepei (zhepei.wei@virginia.edu). If you encounter any problems when using the code, or want to report a bug, feel free to open an issue! Please try to specify the problem with details so we can help you better and quicker!
Please cite our paper if you find the repo helpful in your work:
@article{
wei2025truthrl,
title={Truth{RL}: Incentivizing Truthful {LLMs} via Reinforcement Learning},
author={Wei, Zhepei and Yang, Xiao and Sun, Kai and Wang, Jiaqi and Shao, Rulin and Chen, Sean and Kachuee, Mohammad and Gollapudi, Teja and Liao, Tony and Scheffer, Nicolas and Wanga, Rakesh and Kumar, Anuj and Meng, Yu and Yih, Wen-tau and Dong, Xin Luna},
journal={arXiv preprint arXiv:2509.25760},
year={2025},
}TruthRL is Creative Commons Attribution-NonCommercial 4.0 International License licensed, as found in the LICENSE file.