Tags: #multi-modal
MemTensor/MemOS
MemOS is an AI memory operating system designed for LLMs and AI agents, providing persistent, context-aware, and multi-modal memory for enhanced skill reuse and evolution across tasks.
xszyou/Fay
Fay is an AI agent framework designed to connect digital humans (2.5D, 3D, mobile, PC, web) and large language models (OpenAI compatible, DeepSeek) with various business systems.
bghira/SimpleTuner
A user-friendly, versatile fine-tuning kit for image, video, and audio diffusion models, emphasizing simplicity and cutting-edge features.
PKU-Alignment/align-anything
A modular framework for aligning any-modality large models with human intentions and values using various fine-tuning and reinforcement learning methods.
yzhao062/pyod
A comprehensive Python library for multi-modal anomaly detection, featuring 60+ algorithms and agentic AI capabilities for scalable, expert-level investigations.
X-PLUG/mPLUG-Owl
A family of powerful multi-modal large language models (MLLMs) designed to advance AI's understanding and generation capabilities across various data types.
heshengtao/super-agent-party
An all-in-one self-hosted AI companion platform enabling desktop automation, multi-role chat, and live streaming with customizable virtual models.
lfnovo/open-notebook
An open-source, privacy-focused alternative to Google's Notebook LM, offering flexible AI model choices, multi-modal content organization, and advanced features like multi-speaker podcast generation.