Tags: #benchmarking
CLI Tool for LLM Hardware Optimization
rust
24.9k
AlexsJones/llmfit
A terminal tool that detects your hardware and recommends optimal LLM models, providing performance benchmarks for local execution.
AI Agent Benchmarking Platform
python
2.8k
xlang-ai/OSWorld
OSWorld is a benchmark and environment for evaluating multimodal AI agents on open-ended tasks within real computer operating systems.
AI Model Evaluation Framework
python
4.1k
EvolvingLMMs-Lab/lmms-eval
A unified, reproducible, and efficient multimodal evaluation toolkit for large language models across text, image, video, and audio tasks.