Tags: #benchmark

AI Research Agent

8.1k

MiroMindAI/MiroThinker

An open-source deep research agent designed for complex research and prediction tasks, demonstrating state-of-the-art performance on various AI benchmarks.

ai-agent deep-research llm

Replaces:

ChatGPT

Details

LLM Evaluation Framework

Python

2.0k

tatsu-lab/alpaca_eval

An automatic, fast, and cost-effective evaluation framework for instruction-following language models, highly correlated with human judgments.

llm evaluation language models automatic evaluation

Details

Benchmarking and Evaluation Framework

python

3.2k

embeddings-benchmark/mteb

MTEB is a comprehensive benchmark and evaluation framework designed to assess the performance of text embedding models and retrieval systems across a wide range of tasks.

embeddings benchmark nlp

Details