AI/ML Research Framework
1.6k 2026-04-18

PKU-Alignment/safe-rlhf

A modular open-source framework for training constrained value-aligned Large Language Models (LLMs) using Safe Reinforcement Learning from Human Feedback (RLHF).

Core Features

Supports SFT, RLHF, and Safe RLHF training for popular LLMs (e.g., LLaMA, Baichuan).
Provides large human-labeled datasets (up to 1M pairs) for alignment research.
Offers pre-trained Reward and Cost Models, and supports their training.
Enables customization of parameters and datasets for various training stages.
Includes multi-scale metrics (e.g., BIG-bench, GPT-4 Evaluation) for safety verification.

Detailed Introduction

Beaver, developed by the PKU-Alignment team, is a highly modular open-source RLHF framework designed to advance research in constrained LLM alignment through Safe RLHF methods. It provides a comprehensive and reproducible code pipeline, along with extensive human-labeled datasets, to facilitate the development of LLMs that are both helpful and harmless. The framework supports various training paradigms, including SFT, RLHF, and Safe RLHF, for a range of pre-trained models, ensuring value alignment with robust safety constraints.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.