Tags: #benchmark
AI Research Agent
8.2k
MiroMindAI/MiroThinker
MiroThinker is an advanced AI research agent designed for complex research and prediction tasks, achieving state-of-the-art performance on various benchmarks.
Replaces:
Details LLM Evaluation Framework
Python
2.0k
tatsu-lab/alpaca_eval
An automatic, fast, and cost-effective evaluation framework for instruction-following language models, highly correlated with human judgments.
Benchmarking and Evaluation Framework
python
3.2k
embeddings-benchmark/mteb
MTEB is a comprehensive benchmark and evaluation framework designed to assess the performance of text embedding models and retrieval systems across a wide range of tasks.