LLM Alignment Framework
1.4k 2026-04-18

OpenLMLab/MOSS-RLHF

An open-source framework providing code, models, and insights for stable Reinforcement Learning from Human Feedback (RLHF) training in Large Language Models, focusing on the PPO algorithm and reward modeling.

Core Features

Open-source PPO-max algorithm for stable RLHF training.
Pre-trained Chinese and English reward models.
Annotated HH-RLHF dataset with preference strength.
Released SFT and RLHF-aligned policy models.
Comprehensive technical reports on RLHF and Reward Modeling.

Detailed Introduction

MOSS-RLHF addresses the significant challenges in applying Reinforcement Learning from Human Feedback (RLHF) to Large Language Models, such as reward design complexity and training instability. It aims to lower the barrier for AI researchers by providing a robust framework. The project introduces the PPO-max algorithm for stable training, offers competitive pre-trained reward models in both Chinese and English, and releases valuable datasets, enabling better human alignment for LLMs.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.