Tags: #llm-evaluation
LLM Evaluation Framework
Python
2.0k
tatsu-lab/alpaca_eval
An automatic, fast, and cost-effective evaluation framework for instruction-following language models, highly correlated with human judgments.
LLMOps Platform
4.0k
Agenta-AI/agenta
An open-source LLMOps platform integrating prompt management, evaluation, and observability to accelerate reliable LLM application development.
AI/LLM Observability and Evaluation Platform
Docker
3.2k
langwatch/langwatch
A comprehensive platform for end-to-end testing, simulation, evaluation, and monitoring of LLM-powered agents.
AI/ML Evaluation Framework
python
4.0k
EvolvingLMMs-Lab/lmms-eval
A unified, reproducible, and efficient multimodal evaluation toolkit for large language models across text, image, video, and audio tasks.