AI/ML Testing and Evaluation Framework
5.3k 2026-04-26
Giskard-AI/giskard-oss
An open-source Python library for comprehensive testing, evaluation, and red teaming of LLM agents and AI systems, designed for dynamic, multi-turn interactions.
Core Features
Modular architecture for testing LLMs, black-box agents, and multi-step pipelines.
Advanced evaluation capabilities including scenario API, built-in checks, and LLM-as-judge.
Agent vulnerability scanning and red teaming for robust AI safety.
Support for RAG evaluation and synthetic data generation (in progress).
Designed for non-deterministic outputs and multi-turn conversational agent testing.
Quick Start
pip install giskardDetailed Introduction
Giskard is an open-source Python library specifically engineered for the rigorous testing and evaluation of agentic AI systems. Its v3 architecture is a lightweight, modular rewrite, enabling dynamic, multi-turn testing of LLMs, black-box agents, and complex pipelines. It addresses the unique challenges of non-deterministic AI outputs, offering tools for catching regressions, validating RAG quality, enforcing safety rules, and performing red teaming to ensure AI system reliability and security.