Tags: #llm-evaluation

LLM Evaluation Framework
Python
2.0k

tatsu-lab/alpaca_eval

An automatic, fast, and cost-effective evaluation framework for instruction-following language models, highly correlated with human judgments.

LLMOps Platform
4.0k

Agenta-AI/agenta

An open-source LLMOps platform integrating prompt management, evaluation, and observability to accelerate reliable LLM application development.

AI/LLM Observability and Evaluation Platform
Docker
3.2k

langwatch/langwatch

A comprehensive platform for end-to-end testing, simulation, evaluation, and monitoring of LLM-powered agents.

AI/ML Evaluation Framework
python
4.0k

EvolvingLMMs-Lab/lmms-eval

A unified, reproducible, and efficient multimodal evaluation toolkit for large language models across text, image, video, and audio tasks.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.