AI Inference Engine
17.8k 2026-04-15
mlc-ai/web-llm
A high-performance, in-browser LLM inference engine with OpenAI API compatibility, leveraging WebGPU for local, private AI.
Core Features
In-Browser Inference with WebGPU acceleration
Full OpenAI API Compatibility (streaming, JSON-mode, function-calling)
Structured JSON Generation
Extensive & Custom Model Support (Llama, Phi, Gemma, Mistral, Qwen)
Plug-and-Play Integration (NPM, CDN, Web Workers, Chrome Extensions)
Detailed Introduction
WebLLM is a groundbreaking project that brings large language model inference directly into web browsers, eliminating the need for server-side processing. By harnessing WebGPU for hardware acceleration, it enables high-performance, privacy-preserving AI applications. Its full compatibility with the OpenAI API allows developers to seamlessly integrate open-source LLMs into web apps with familiar tools, supporting features like streaming and structured JSON generation. This empowers the creation of local, secure, and interactive AI experiences.