Tags: #distributed-training
Deep Learning Framework
pytorch
31.1k
Lightning-AI/pytorch-lightning
Streamlines complex deep learning engineering, enabling scalable AI model training and finetuning across diverse hardware with minimal code changes.
AI/MLOps Platform
kubernetes
5.0k
tencentmusic/cube-studio
An open-source, cloud-native, all-in-one MLOps platform designed for the full lifecycle management of machine learning, deep learning, and large language model development and deployment.
Replaces:
Details AI/Deep Learning Optimization Framework
NVIDIA GPUs
41.4k
hpcaitech/ColossalAI
An open-source framework designed to make large AI model training and inference cheaper, faster, and more accessible through advanced distributed computing and memory optimization techniques.
Replaces:
Details Reinforcement Learning Library for LLMs
Ray
3.1k
alibaba/ROLL
An efficient and user-friendly scaling library designed to optimize Reinforcement Learning with Large Language Models, enhancing performance in complex AI tasks.