Tags: #ai-evaluation
AI Agent Benchmarking Platform
VMware Workstation Pro
2.8k
xlang-ai/OSWorld
OSWorld is a comprehensive benchmarking platform designed to evaluate multimodal AI agents on open-ended tasks within real computer environments.
AI Agent Development Framework
2.7k
EvoAgentX/EvoAgentX
An open-source framework for building, evaluating, and self-evolving AI agents and workflows based on Large Language Models (LLMs).