LLM Alignment Toolkit
5.6k 2026-04-18
huggingface/alignment-handbook
Provides robust recipes and training code to align language models with human and AI preferences, enhancing helpfulness and safety.
Core Features
Offers comprehensive training recipes for LLM alignment.
Supports diverse alignment techniques including SFT, DPO, ORPO, RLAIF, and Constitutional AI.
Includes scripts for continued pretraining, supervised fine-tuning, and preference alignment.
Facilitates distributed training with DeepSpeed ZeRO-3 and parameter-efficient fine-tuning with LoRA/QLoRA.
Provides reproducible recipes for state-of-the-art aligned models like Zephyr and SmolLM.
Detailed Introduction
Following the success of models like ChatGPT and Llama, the machine learning community recognized the critical need to align language models with human and AI preferences for improved helpfulness and safety, beyond basic supervised fine-tuning. The Alignment Handbook addresses the scarcity of public resources in this domain by offering a series of robust, end-to-end training recipes. It covers the entire pipeline, from data collection to model training and evaluation, making advanced LLM alignment techniques accessible to developers and researchers.