OpenLMLab/MOSS-RLHF - OSS Alternative - Discover Top Open Source Alternatives to Popular Software

LLM Alignment Framework

1.4k 2026-04-18

OpenLMLab/MOSS-RLHF

An open-source framework providing code, models, and insights for stable Reinforcement Learning from Human Feedback (RLHF) training in Large Language Models, focusing on the PPO algorithm and reward modeling.

Core Features

Open-source PPO-max algorithm for stable RLHF training.

Pre-trained Chinese and English reward models.

Annotated HH-RLHF dataset with preference strength.

Released SFT and RLHF-aligned policy models.

Comprehensive technical reports on RLHF and Reward Modeling.

GitHub Repo Documentation

Detailed Introduction

MOSS-RLHF addresses the significant challenges in applying Reinforcement Learning from Human Feedback (RLHF) to Large Language Models, such as reward design complexity and training instability. It aims to lower the barrier for AI researchers by providing a robust framework. The project introduces the PPO-max algorithm for stable training, offers competitive pre-trained reward models in both Chinese and English, and releases valuable datasets, enabling better human alignment for LLMs.