Tags: #evaluation - OSS Alternative - Discover Top Open Source Alternatives to Popular Software

Tags: #evaluation

AI Observability and MLOps Platform
Python
19.1k

comet-ml/opik

An open-source platform for comprehensive tracing, evaluation, and optimization of LLM applications, RAG systems, and agentic workflows.

LLM Evaluation & Red Teaming Tool
Node.js
20.5k

promptfoo/promptfoo

A CLI and library for testing, evaluating, and red-teaming LLM applications to ensure security, reliability, and performance across various models.

LLMOps Platform
rust
11.3k

tensorzero/tensorzero

An open-source LLMOps platform unifying LLM gateway, observability, evaluation, optimization, and experimentation for robust AI application development.

AI Agent Development and Operations Platform
docker
5.4k

coze-dev/coze-loop

Coze Loop is an open-source platform providing full-lifecycle management for AI agents, covering development, debugging, evaluation, and monitoring to streamline their creation and operation.

AI Engineering Platform
Python
25.8k

mlflow/mlflow

An open-source AI engineering platform for debugging, evaluating, monitoring, and optimizing production-quality AI applications, including agents, LLMs, and ML models.

AI Agent Development Framework & CLI Tool
python
3.8k

evalstate/fast-agent

A flexible CLI-first framework for building, evaluating, and interacting with sophisticated multimodal LLM agents and workflows, offering comprehensive model and skill support.

AI Agent Benchmarking Platform
python
2.8k

xlang-ai/OSWorld

OSWorld is a benchmark and environment for evaluating multimodal AI agents on open-ended tasks within real computer operating systems.

AI/ML Observability Platform
Python
9.4k

Arize-ai/phoenix

An open-source platform for debugging, evaluating, and monitoring AI/ML models and pipelines.

AI Development Platform
Ollama
4.8k

Kiln-AI/Kiln

A free, all-in-one platform for building, evaluating, and optimizing AI systems, offering tools for RAG, agents, fine-tuning, and synthetic data generation.

AI/ML Testing and Evaluation Framework
5.3k

Giskard-AI/giskard-oss

An open-source Python library for comprehensive testing, evaluation, and red teaming of LLM agents and AI systems, designed for dynamic, multi-turn interactions.

MLOps Platform
Python
9.2k

oumi-ai/oumi

An end-to-end platform for fine-tuning, evaluating, and deploying open-source Large Language Models (LLMs) and Vision Language Models (VLMs).

LLM Orchestration Library / NLP Framework
Python 3.9+
4.6k

promptslab/Promptify

A Python library for structured NLP tasks using LLMs, offering Pydantic outputs, multi-provider support, and built-in evaluation.

Benchmarking and Evaluation Framework
python
3.2k

embeddings-benchmark/mteb

MTEB is a comprehensive benchmark and evaluation framework designed to assess the performance of text embedding models and retrieval systems across a wide range of tasks.

RAG Optimization Framework
python
4.7k

Marker-Inc-Korea/AutoRAG

An open-source framework that automates the evaluation and optimization of Retrieval-Augmented Generation (RAG) pipelines using AutoML-style automation for specific datasets.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.