LLM Evaluation and Testing Framework
5.3k 2026-04-13
Giskard-AI/giskard-oss
An open-source Python library for comprehensive evaluation, testing, and red teaming of LLM agents and agentic systems.
Core Features
Modular architecture for testing diverse AI systems (LLMs, black-box agents, pipelines).
Dynamic, multi-turn evaluation for non-deterministic LLM outputs.
Built-in checks for RAG quality, safety rules, and LLM-as-judge assessments.
Agent vulnerability scanning (red teaming, prompt injection, data leakage).
RAG evaluation and synthetic data generation capabilities.
Quick Start
pip install giskardDetailed Introduction
Giskard v3 is a re-engineered open-source Python library designed for robust evaluation and testing of LLM agents and complex AI systems. It features a modular, lightweight, and async-first architecture, enabling developers to create dynamic, multi-turn tests for non-deterministic outputs. Giskard helps catch regressions, validate RAG quality, enforce safety rules, and identify vulnerabilities through red teaming, making it an essential tool for ensuring the reliability, safety, and performance of AI applications in production.