stas00/ml-engineering
An open collection of methodologies, tools, and step-by-step instructions for successfully training, fine-tuning, and inferencing large language and multi-modal models.
Core Features
Detailed Introduction
This GitHub repository serves as an open book on Machine Learning Engineering, compiling practical know-how from extensive experience in training and fine-tuning large language and multi-modal models. It offers a rich collection of scripts, commands, and methodologies covering critical aspects from hardware selection and orchestration to model training, inference, debugging, and testing. Designed for LLM/VLM training engineers and operators, it aims to provide quick, proven solutions and insights derived from projects like BLOOM-176B and IDEFICS-80B, making complex ML operations more accessible and efficient.