xlang-ai/OSWorld
OSWorld is a benchmark and environment for evaluating multimodal AI agents on open-ended tasks within real computer operating systems.
Core Features
Quick Start
pip install desktop-envDetailed Introduction
OSWorld is a cutting-edge benchmark and environment specifically engineered to rigorously evaluate multimodal AI agents. It facilitates the assessment of agent performance on complex, open-ended tasks directly within authentic computer operating systems, utilizing diverse virtualization platforms such as VMware, VirtualBox, Docker, AWS, and Azure. This project addresses a crucial gap in AI evaluation by providing a realistic, dynamic testing ground, moving beyond traditional simulated environments. It offers a standardized framework for researchers and developers to accurately measure and compare the capabilities of their AI agents, thereby accelerating progress in general-purpose AI development.