Tags: #benchmark
AI Research Agent
8.1k
MiroMindAI/MiroThinker
An open-source deep research agent designed for complex research and prediction tasks, demonstrating state-of-the-art performance on various AI benchmarks.
LLM Evaluation Framework
Python
2.0k
tatsu-lab/alpaca_eval
An automatic, fast, and cost-effective evaluation framework for instruction-following language models, highly correlated with human judgments.
Benchmarking and Evaluation Framework
python
3.2k
embeddings-benchmark/mteb
MTEB is a comprehensive benchmark and evaluation framework designed to assess the performance of text embedding models and retrieval systems across a wide range of tasks.