Ecosystem & Stack: sglang

LLM Inference Optimization Engine

8.0k

LMCache/LMCache

LMCache is an LLM serving engine extension designed to significantly reduce Time-To-First-Token (TTFT) and boost throughput, especially for long-context scenarios, by intelligently reusing KV caches.

llm kv-cache inference-optimization

Details

LLM Optimization Toolkit

huggingface

1.1k

ModelCloud/GPTQModel

A toolkit for quantizing (compressing) Large Language Models (LLMs) with hardware acceleration across various GPUs and CPUs, integrating with popular inference frameworks.

llm quantization compression

Details

Reinforcement Learning Library for LLMs

Ray

3.1k

alibaba/ROLL

An efficient and user-friendly library for scaling Reinforcement Learning with Large Language Models on large-scale GPU resources.

reinforcement-learning large-language-models distributed-training

Details