xlang-ai/OSWorld
OSWorld is a comprehensive benchmarking platform designed to evaluate multimodal AI agents on open-ended tasks within real computer environments.
Core Features
Quick Start
pip install desktop-envDetailed Introduction
OSWorld is a cutting-edge benchmarking platform introduced at NeurIPS 2024, specifically engineered to assess the capabilities of multimodal AI agents. It provides a unique environment where agents can perform open-ended tasks directly within real computer operating systems, leveraging virtualization technologies like VMware, VirtualBox, Docker, AWS, and Azure. This project addresses the critical need for robust evaluation methods for AI agents operating in complex, dynamic digital environments, offering a standardized framework to measure their performance and advance research in general-purpose AI.