Educational Course
4.1k 2026-04-13

skyzh/tiny-llm

An educational course for systems engineers to learn LLM inference serving on Apple Silicon by building a simplified vLLM-like system with MLX from scratch.

Core Features

Build LLM serving infrastructure from scratch using MLX.
Implement core LLM components like Attention, RoPE, and GQA.
Learn advanced inference techniques such as KV cache and continuous batching.
Focus on optimizations for Apple Silicon hardware.
Utilize the Qwen2 model as a practical example.

Detailed Introduction

The tiny-llm project is an educational course tailored for systems engineers aiming to master large language model (LLM) inference serving. It provides a hands-on approach to constructing a simplified vLLM-like system from the ground up, utilizing Apple's MLX framework. The curriculum delves into low-level MLX array/matrix APIs, bypassing high-level neural network abstractions to foster a deep understanding of underlying mechanisms. It covers essential LLM components and advanced serving optimizations like KV cache, continuous batching, and flash attention, specifically optimized for efficient deployment on Apple Silicon, using the Qwen2 model as a practical case study.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.