Tags: #llm-evaluation
LLM Evaluation Framework
Python
2.0k
tatsu-lab/alpaca_eval
An automatic, fast, and cost-effective evaluation framework for instruction-following language models, highly correlated with human judgments.
LLMOps Platform
4.1k
Agenta-AI/agenta
An open-source LLMOps platform integrating prompt management, evaluation, and observability to accelerate reliable LLM application development.
LLM Operations Platform
Node.js
3.2k
langwatch/langwatch
A unified platform for end-to-end LLM evaluation, AI agent testing, monitoring, and optimization, designed to streamline the development and deployment of reliable AI systems.
AI Model Evaluation Framework
python
4.1k
EvolvingLMMs-Lab/lmms-eval
A unified, reproducible, and efficient multimodal evaluation toolkit for large language models across text, image, video, and audio tasks.