Tags: #ai-evaluation

AI Agent Benchmarking Platform

2.8k

OSWorld is a comprehensive benchmarking platform designed to evaluate multimodal AI agents on open-ended tasks within real computer environments.

AI Agent Development Framework

2.7k

An open-source framework for building, evaluating, and self-evolving AI agents and workflows based on Large Language Models (LLMs).