Tags: #benchmarking

CLI Tool for LLM Hardware Optimization

24.9k

A terminal tool that detects your hardware and recommends optimal LLM models, providing performance benchmarks for local execution.

AI Agent Benchmarking Platform

2.8k

OSWorld is a benchmark and environment for evaluating multimodal AI agents on open-ended tasks within real computer operating systems.

AI Model Evaluation Framework

4.1k

A unified, reproducible, and efficient multimodal evaluation toolkit for large language models across text, image, video, and audio tasks.