[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71166":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":44,"readmeContent":45,"aiSummary":46,"trendingCount":16,"starSnapshotCount":16,"syncStatus":47,"lastSyncTime":48,"discoverSource":49},71166,"giskard-oss","Giskard-AI\u002Fgiskard-oss","Giskard-AI","🐢 Open-Source Evaluation & Testing library for LLM Agents","https:\u002F\u002Fdocs.giskard.ai",null,"Python",5426,468,39,33,0,7,16,83,21,94.31,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43],"agent-evaluation","ai-red-team","ai-security","ai-testing","fairness-ai","llm","llm-eval","llm-evaluation","llm-security","llmops","ml-testing","ml-validation","mlops","rag-evaluation","red-team-tools","responsible-ai","trustworthy-ai","2026-06-12 04:00:59","\u003Cp align=\"center\">\n  \u003Cimg alt=\"giskardlogo\" src=\"readme\u002Flogo_light.png#gh-light-mode-only\">\n  \u003Cimg alt=\"giskardlogo\" src=\"readme\u002Flogo_dark.png#gh-dark-mode-only\">\n\u003C\u002Fp>\n\u003Ch1 align=\"center\" weight='300' >Evals, Red Teaming and Test Generation for Agentic Systems\u003C\u002Fh1>\n\u003Ch3 align=\"center\" weight='300' >Modular, Lightweight, Dynamic and Async-first \u003C\u002Fh3>\n\u003Cdiv align=\"center\">\n\n[![GitHub release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002FGiskard-AI\u002Fgiskard)](https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard\u002Freleases)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard\u002Fblob\u002Fmain\u002FLICENSE)\n[![Downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fgiskard\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fgiskard)\n[![CI](https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard-oss\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg?branch=main)](https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard-oss\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg?branch=main)\n[![Giskard on Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F939190303397666868?label=Discord)](https:\u002F\u002Fgisk.ar\u002Fdiscord)\n\n\u003Ca rel=\"me\" href=\"https:\u002F\u002Ffosstodon.org\u002F@Giskard\">\u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\u003Ch3 align=\"center\">\n   \u003Ca href=\"https:\u002F\u002Fdocs.giskard.ai\u002Foss\">\u003Cb>Docs\u003C\u002Fb>\u003C\u002Fa> &bull;\n  \u003Ca href=\"https:\u002F\u002Fwww.giskard.ai\u002F?utm_source=github&utm_medium=github&utm_campaign=github_readme&utm_id=readmeblog\">\u003Cb>Website\u003C\u002Fb>\u003C\u002Fa> &bull;\n  \u003Ca href=\"https:\u002F\u002Fgisk.ar\u002Fdiscord\">\u003Cb>Community\u003C\u002Fb>\u003C\u002Fa>\n \u003C\u002Fh3>\n\u003Cbr \u002F>\n\n> [!IMPORTANT]\n> **Giskard v3** is a fresh rewrite designed for dynamic, multi-turn testing of AI agents. This release drops heavy dependencies for better efficiency while introducing a more powerful AI vulnerability scanner and enhanced RAG evaluation capabilities. For now, the vulnerability scanner and RAG evaluation still rely on Giskard v2.\n> **Giskard v2 remains available but is no longer actively maintained.**\n> Follow progress → [Read the v3 Announcement](https:\u002F\u002Fgithub.com\u002Forgs\u002FGiskard-AI\u002Fdiscussions\u002F2250) · [Roadmap](https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard-oss\u002Fissues\u002F2252)\n\n## Install\n\n```sh\npip install giskard\n```\n\nRequires Python 3.12+.\n\n**Telemetry:** Libraries built on `giskard-core` (including `giskard-checks`) may send **optional, aggregated usage analytics** to help improve the product. No prompts, model outputs, or scenario text are included. See [what is collected and how to opt out](libs\u002Fgiskard-core\u002FREADME.md#telemetry).\n\n---\n\nGiskard is an open-source Python library for **testing and evaluating agentic systems**. The v3 architecture is a modular set of focused packages — each carrying only the dependencies it needs — built from scratch to wrap anything: an LLM, a black-box agent, or a multi-step pipeline.\n\n| Status         | Package          | Description                                                                                                                                                              |\n| -------------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |\n| ✅ Beta        | `giskard-checks` | Testing & evaluation — scenario API, built-in checks, LLM-as-judge                                                                                                       |\n| 🚧 In progress | `giskard-scan`   | Agent vulnerability scanner — red teaming, prompt injection, data leakage (successor of [v2 Scan](https:\u002F\u002Flegacy-docs.giskard.ai\u002Fen\u002Fstable\u002Fopen_source\u002Fscan\u002Findex.html)) |\n| 📋 Planned     | `giskard-rag`    | RAG evaluation & synthetic data generation (successor of [v2 RAGET](https:\u002F\u002Flegacy-docs.giskard.ai\u002Fen\u002Fstable\u002Fopen_source\u002Ftestset_generation\u002Findex.html))                 |\n\n## Giskard Checks — create and apply evals for testing agents\n\n```sh\npip install giskard-checks\n```\n\n**[Giskard Checks](https:\u002F\u002Fdocs.giskard.ai\u002Foss\u002Fchecks)** is a lightweight library for creating evaluations (evals) that test LLM-based systems — from simple assertions to LLM-as-judge assessments. Unlike traditional unit tests, evals are designed for **non-deterministic outputs** where the same input can produce different valid responses.\n\nUse Giskard Checks to:\n\n- **Catch regressions** — verify your system still behaves correctly after changes\n- **Validate RAG quality** — check if answers are grounded in retrieved context\n- **Enforce safety rules** — ensure outputs conform to your content policies\n- **Evaluate multi-turn agents** — test full conversations, not just single exchanges\n\nBuilt-in evals include string matching, comparisons, regex, semantic similarity, and LLM-as-judge checks (`Groundedness`, `Conformity`, `LLMJudge`).\n\n### Quickstart\n\n```python\nfrom openai import OpenAI\nfrom giskard.checks import Scenario, Groundedness\n\nclient = OpenAI()\n\ndef get_answer(inputs: str) -> str:\n    response = client.chat.completions.create(\n        model=\"gpt-5-mini\",\n        messages=[{\"role\": \"user\", \"content\": inputs}],\n    )\n    return response.choices[0].message.content\n\nscenario = (\n    Scenario(\"test_dynamic_output\")\n    .interact(\n        inputs=\"What is the capital of France?\",\n        outputs=get_answer,\n    )\n    .check(\n        Groundedness(\n            name=\"answer is grounded\",\n            context=\"France is a country in Western Europe. Its capital is Paris.\",\n        )\n    )\n)\n\nresult = await scenario.run()\nresult.print_report()\n```\n\n> The `run()` method is async. In a script, wrap it with `asyncio.run()`. See the [full docs](https:\u002F\u002Fdocs.giskard.ai\u002Foss\u002Fchecks) for `Suites`, `LLMJudge`, multi-turn scenarios, and more.\n\n## Looking for Giskard v2?\n\nGiskard v2 included **Scan** (automatic vulnerability detection) and **RAGET** (RAG evaluation test set generation) for both ML models and LLM applications. These features are not available in v3.\n\n```sh\npip install \"giskard[llm]>2,\u003C3\"\n```\n\n### [Scan](https:\u002F\u002Flegacy-docs.giskard.ai\u002Fen\u002Fstable\u002Fopen_source\u002Fscan\u002Findex.html) — automatically detect performance, bias & security issues\n\nWrap your model and run the scan:\n\n```python\nimport giskard\nimport pandas as pd\n\n# Replace my_llm_chain with your actual LLM chain or model inference logic\ndef model_predict(df: pd.DataFrame):\n    \"\"\"The function takes a DataFrame and must return a list of outputs (one per row).\"\"\"\n    return [my_llm_chain.run({\"query\": question}) for question in df[\"question\"]]\n\ngiskard_model = giskard.Model(\n    model=model_predict,\n    model_type=\"text_generation\",\n    name=\"My LLM Application\",\n    description=\"A question answering assistant\",\n    feature_names=[\"question\"],\n)\n\nscan_results = giskard.scan(giskard_model)\ndisplay(scan_results)\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"readme\u002Fscan_updated.gif\" alt=\"Scan Example\" width=\"800\">\n\u003C\u002Fp>\n\n### [RAGET](https:\u002F\u002Flegacy-docs.giskard.ai\u002Fen\u002Fstable\u002Fopen_source\u002Ftestset_generation\u002Findex.html) — generate evaluation datasets for RAG applications\n\nAutomatically generate questions, reference answers, and context from your knowledge base:\n\n```python\nimport pandas as pd\nfrom giskard.rag import generate_testset, KnowledgeBase\n\n# Load your knowledge base documents\ndf = pd.read_csv(\"path\u002Fto\u002Fyour\u002Fknowledge_base.csv\")\nknowledge_base = KnowledgeBase.from_pandas(df, columns=[\"column_1\", \"column_2\"])\n\ntestset = generate_testset(\n    knowledge_base,\n    num_questions=60,\n    language='en',\n    agent_description=\"A customer support chatbot for company X\",\n)\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"readme\u002FRAGET_updated.gif\" alt=\"RAGET Example\" width=\"800\">\n\u003C\u002Fp>\n\n[Full v2 docs](https:\u002F\u002Flegacy-docs.giskard.ai)\n\n\u003Ch1 id=\"community\">👋 Community\u003C\u002Fh1>\n\nWe welcome contributions from the AI community! Read this [guide](.\u002FCONTRIBUTING.md) to get started, and join our thriving community on [Discord](https:\u002F\u002Fgisk.ar\u002Fdiscord).\n\nFollow the progress and share feedback:\n[v3 Announcement](https:\u002F\u002Fgithub.com\u002Forgs\u002FGiskard-AI\u002Fdiscussions\u002F2250) · [Roadmap](https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard-oss\u002Fissues\u002F2252)\n\n🌟 [Leave us a star](https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard), it helps the project to get discovered by others and keeps us motivated to build awesome open-source tools! 🌟\n\n❤️ If you find our work useful, please consider [sponsoring us](https:\u002F\u002Fgithub.com\u002Fsponsors\u002FGiskard-AI) on GitHub. With a monthly sponsoring, you can get a sponsor badge, display your company in this readme, and get your bug reports prioritized. We also offer one-time sponsoring if you want us to get involved in a consulting project, run a workshop, or give a talk at your company.\n","Giskard 是一个用于测试和评估代理系统的开源 Python 库。其核心功能包括模块化、轻量级的多轮次测试框架，支持动态及异步优先的设计理念，特别适用于大型语言模型（LLM）的安全性评估与红队攻击模拟。该工具集成了强大的AI漏洞扫描器，并增强了对检索增强生成（RAG）模型的评价能力，旨在为开发者提供全面而高效的测试解决方案。适用于需要确保AI系统安全可靠性的场景，如开发负责任且值得信赖的人工智能应用时。",2,"2026-06-11 03:36:22","high_star"]