LLM Evaluation and Testing Framework
5.3k 2026-04-13

Giskard-AI/giskard-oss

An open-source Python library for comprehensive evaluation, testing, and red teaming of LLM agents and agentic systems.

Core Features

Modular architecture for testing diverse AI systems (LLMs, black-box agents, pipelines).
Dynamic, multi-turn evaluation for non-deterministic LLM outputs.
Built-in checks for RAG quality, safety rules, and LLM-as-judge assessments.
Agent vulnerability scanning (red teaming, prompt injection, data leakage).
RAG evaluation and synthetic data generation capabilities.

Quick Start

pip install giskard

Detailed Introduction

Giskard v3 is a re-engineered open-source Python library designed for robust evaluation and testing of LLM agents and complex AI systems. It features a modular, lightweight, and async-first architecture, enabling developers to create dynamic, multi-turn tests for non-deterministic outputs. Giskard helps catch regressions, validate RAG quality, enforce safety rules, and identify vulnerabilities through red teaming, making it an essential tool for ensuring the reliability, safety, and performance of AI applications in production.

OSS Alternative

Explore the best open source alternatives to commercial software.

© 2026 OSS Alternative. hotgithub.com - All rights reserved.