Multimodal AI Agent Desktop Application
29.6k 2026-05-01
bytedance/UI-TARS-desktop
An open-source desktop application providing a native GUI Agent for human-like task completion through multimodal AI, enabling local and remote computer/browser automation.
Core Features
Native GUI Agent based on the UI-TARS model.
Local and remote computer and browser operation capabilities.
Multimodal AI capabilities including GUI Agent and Vision.
Seamless integration with various real-world tools (MCP).
Advanced debugging with Event Stream Viewer and runtime settings.
Detailed Introduction
UI-TARS Desktop is a key component of the TARS Multimodal AI Agent stack, offering a powerful desktop application for advanced automation. It leverages cutting-edge multimodal LLMs and the UI-TARS model to enable AI agents to interact with computers and browsers in a human-like manner. This project aims to simplify complex tasks by providing native GUI control, vision capabilities, and seamless integration with real-world tools, facilitating efficient local and remote operations and offering robust debugging features for agent development.