Tags: #llm-evaluation

LLM Evaluation Framework

2.0k

tatsu-lab/alpaca_eval

An automatic, fast, and cost-effective evaluation framework for instruction-following language models, highly correlated with human judgments.

llm evaluation language models automatic evaluation

Details

LLMOps Platform

4.1k

Agenta-AI/agenta

An open-source LLMOps platform integrating prompt management, evaluation, and observability to accelerate reliable LLM application development.

llmops prompt engineering llm evaluation

Details

LLM Operations Platform

Node.js

3.2k

langwatch/langwatch

A unified platform for end-to-end LLM evaluation, AI agent testing, monitoring, and optimization, designed to streamline the development and deployment of reliable AI systems.

llm-evaluation ai-agent-testing observability

Details

AI Model Evaluation Framework

python

4.1k

EvolvingLMMs-Lab/lmms-eval

A unified, reproducible, and efficient multimodal evaluation toolkit for large language models across text, image, video, and audio tasks.

multimodal-ai llm-evaluation benchmarking

Details