[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71054":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},71054,"garak","NVIDIA\u002Fgarak","NVIDIA","the LLM vulnerability scanner","https:\u002F\u002Fdiscord.gg\u002FuVch4puUCs",null,"Python",8074,1010,54,234,0,41,94,300,123,40.01,"Apache License 2.0",false,"main",true,[27,28,29,30,31],"ai","llm-evaluation","llm-security","security-scanners","vulnerability-assessment","2026-06-12 02:02:47","# garak, LLM vulnerability scanner\n\n*Generative AI Red-teaming & Assessment Kit*\n\n`garak` checks if an LLM can be made to fail in a way we don't want. `garak` probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. If you know `nmap` or `msf` \u002F Metasploit Framework, garak does somewhat similar things to them, but for LLMs. \n\n`garak` focuses on ways of making an LLM or dialog system fail. It combines static, dynamic, and adaptive probes to explore this.\n\n`garak`'s a free tool. We love developing it and are always interested in adding functionality to support applications. \n\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Tests\u002FLinux](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak\u002Factions\u002Fworkflows\u002Ftest_linux.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak\u002Factions\u002Fworkflows\u002Ftest_linux.yml)\n[![Tests\u002FWindows](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak\u002Factions\u002Fworkflows\u002Ftest_windows.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak\u002Factions\u002Fworkflows\u002Ftest_windows.yml)\n[![Tests\u002FOSX](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak\u002Factions\u002Fworkflows\u002Ftest_macos.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak\u002Factions\u002Fworkflows\u002Ftest_macos.yml)\n[![Documentation Status](https:\u002F\u002Freadthedocs.org\u002Fprojects\u002Fgarak\u002Fbadge\u002F?version=latest)](http:\u002F\u002Fgarak.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcs.CL-arXiv%3A2406.11036-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.11036)\n[![discord-img](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fchat-on%20discord-yellow.svg)](https:\u002F\u002Fdiscord.gg\u002FuVch4puUCs)\n[![Code style: black](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg)](https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack)\n[![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fgarak)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fgarak)\n[![PyPI](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fgarak.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fgarak)\n[![Downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fgarak)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fgarak)\n[![Downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fgarak\u002Fmonth)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fgarak)\n\n\n## Get started\n### > See our user guide! [docs.garak.ai](https:\u002F\u002Fdocs.garak.ai\u002F)\n### > Join our [Discord](https:\u002F\u002Fdiscord.gg\u002FuVch4puUCs)!\n### > Project links & home: [garak.ai](https:\u002F\u002Fgarak.ai\u002F)\n### > Twitter: [@garak_llm](https:\u002F\u002Ftwitter.com\u002Fgarak_llm)\n### > DEF CON [slides](https:\u002F\u002Fgarak.ai\u002Fgarak_aiv_slides.pdf)!\n\n\u003Chr>\n\n## LLM support\n\ncurrently supports:\n* [hugging face hub](https:\u002F\u002Fhuggingface.co\u002Fmodels) generative models\n* [replicate](https:\u002F\u002Freplicate.com\u002F) text models\n* [openai api](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fintroduction) chat & continuation models\n* [aws bedrock](https:\u002F\u002Faws.amazon.com\u002Fbedrock\u002F) foundation models\n* [litellm](https:\u002F\u002Fwww.litellm.ai\u002F)\n* pretty much anything accessible via REST\n* gguf models like [llama.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp) version >= 1046\n* .. and many more LLMs!\n\n## Install:\n\n`garak` is a command-line tool. It's developed in Linux and OSX.\n\n### Standard install with `pip`\n\nJust grab it from PyPI and you should be good to go:\n\n```\npython -m pip install -U garak\n```\n\n### Install development version with `pip`\n\nThe standard pip version of `garak` is updated periodically. To get a fresher version from GitHub, try:\n\n```\npython -m pip install -U git+https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak.git@main\n```\n\n### Clone from source\n\n`garak` has its own dependencies. You can to install `garak` in its own Conda environment:\n\n```\nconda create --name garak \"python>=3.10,\u003C=3.12\"\nconda activate garak\ngh repo clone NVIDIA\u002Fgarak\ncd garak\npython -m pip install -e .\n```\n\nOK, if that went fine, you're probably good to go!\n\n**Note**: if you cloned before the move to the `NVIDIA` GitHub organisation, but you're reading this at the `github.com\u002FNVIDIA` URI, please update your remotes as follows:\n\n```\ngit remote set-url origin https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak.git\n```\n\n\n## Getting started\n\nThe general syntax is:\n\n`garak \u003Coptions>`\n\n`garak` needs to know what model to scan, and by default, it'll try all the probes it knows on that model, using the vulnerability detectors recommended by each probe. You can see a list of probes using:\n\n`garak --list_probes`\n\nTo specify a generator, use the `--target_type` and, optionally, the `--target_name` options. Model type specifies a model family\u002Finterface; model name specifies the exact model to be used. The \"Intro to generators\" section below describes some of the generators supported. A straightforward generator family is Hugging Face models; to load one of these, set `--target_type` to `huggingface` and `--target_name` to the model's name on Hub (e.g. `\"RWKV\u002Frwkv-4-169m-pile\"`). Some generators might need an API key to be set as an environment variable, and they'll let you know if they need that.\n\n`garak` runs all the probes by default, but you can be specific about that too. `--probes promptinject` will use only the [PromptInject](https:\u002F\u002Fgithub.com\u002Fagencyenterprise\u002Fpromptinject) framework's methods, for example. You can also specify one specific plugin instead of a plugin family by adding the plugin name after a `.`; for example, `--probes lmrc.SlurUsage` will use an implementation of checking for models generating slurs based on the [Language Model Risk Cards](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.18190) framework.\n\nFor help and inspiration, find us on [Twitter](https:\u002F\u002Ftwitter.com\u002Fgarak_llm) or [discord](https:\u002F\u002Fdiscord.gg\u002FuVch4puUCs)!\n\n## Examples\n\nProbe a commercial model for encoding-based prompt injection (OSX\u002F\\*nix) (replace example value with a real OpenAI API key)\n \n```\nexport OPENAI_API_KEY=\"sk-123XXXXXXXXXXXX\"\npython3 -m garak --target_type openai --target_name gpt-5-nano --probes encoding\n```\n\nSee if the Hugging Face version of GPT2 is vulnerable to DAN 11.0\n\n```\npython3 -m garak --target_type huggingface --target_name gpt2 --probes dan.Dan_11_0\n```\n\n\n## Reading the results\n\nFor each probe loaded, garak will print a progress bar as it generates. Once generation is complete, a row evaluating that probe's results on each detector is given. If any of the prompt attempts yielded an undesirable behavior, the response will be marked as FAIL, and the failure rate given.\n\nHere are the results with the `encoding` module on a GPT-3 variant:\n![alt text](https:\u002F\u002Fi.imgur.com\u002F8Dxf45N.png)\n\nAnd the same results for ChatGPT:\n![alt text](https:\u002F\u002Fi.imgur.com\u002FVKAF5if.png)\n\nWe can see that the more recent model is much more susceptible to encoding-based injection attacks, where text-babbage-001 was only found to be vulnerable to quoted-printable and MIME encoding injections.  The figures at the end of each row, e.g. 840\u002F840, indicate the number of text generations total and then how many of these seemed to behave OK. The figure can be quite high because more than one generation is made per prompt - by default, 10.\n\nErrors go in `garak.log`; the run is logged in detail in a `.jsonl` file specified at analysis start & end. There's a basic analysis script in `analyse\u002Fanalyse_log.py` which will output the probes and prompts that led to the most hits.\n\nSend PRs & open issues. Happy hunting!\n\n## Intro to generators\n\n### Hugging Face\n\nUsing the Pipeline API:\n* `--target_type huggingface` (for transformers models to run locally)\n* `--target_name` - use the model name from Hub. Only generative models will work. If it fails and shouldn't, please open an issue and paste in the command you tried + the exception!\n\nUsing the Inference API:\n* `--target_type huggingface.InferenceAPI` (for API-based model access)\n* `--target_name` - the model name from Hub, e.g. `\"mosaicml\u002Fmpt-7b-instruct\"`\n\nUsing private endpoints:\n* `--target_type huggingface.InferenceEndpoint` (for private endpoints)\n* `--target_name` - the endpoint URL, e.g. `https:\u002F\u002Fxxx.us-east-1.aws.endpoints.huggingface.cloud`\n\n* (optional) set the `HF_INFERENCE_TOKEN` environment variable to a Hugging Face API token with the \"read\" role; see https:\u002F\u002Fhuggingface.co\u002Fsettings\u002Ftokens when logged in\n\n### OpenAI\n\n* `--target_type openai`\n* `--target_name` - the OpenAI model you'd like to use. `gpt-5-nano` is fast and fine for testing.\n* set the `OPENAI_API_KEY` environment variable to your OpenAI API key (e.g. \"sk-19763ASDF87q6657\"); see https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fapi-keys when logged in\n\nRecognised model types are whitelisted, because the plugin needs to know which sub-API to use. Completion or ChatCompletion models are OK. If you'd like to use a model not supported, you should get an informative error message, and please send a PR \u002F open an issue.\n\n### Replicate\n\n* set the `REPLICATE_API_TOKEN` environment variable to your Replicate API token, e.g. \"r8-123XXXXXXXXXXXX\"; see https:\u002F\u002Freplicate.com\u002Faccount\u002Fapi-tokens when logged in\n\nPublic Replicate models:\n* `--target_type replicate`\n* `--target_name` - the Replicate model name and hash, e.g. `\"stability-ai\u002Fstablelm-tuned-alpha-7b:c49dae36\"`\n\nPrivate Replicate endpoints:\n* `--target_type replicate.InferenceEndpoint` (for private endpoints)\n* `--target_name` - username\u002Fmodel-name slug from the deployed endpoint, e.g. `elim\u002Felims-llama2-7b`\n\n### Cohere\n\n* `--target_type cohere`\n* `--target_name` (optional, `command` by default) - The specific Cohere model you'd like to test\n* set the `COHERE_API_KEY` environment variable to your Cohere API key, e.g. \"aBcDeFgHiJ123456789\"; see https:\u002F\u002Fdashboard.cohere.ai\u002Fapi-keys when logged in\n\n### Groq\n\n* `--target_type groq`\n* `--target_name` - The name of the model to access via the Groq API\n* set the `GROQ_API_KEY` environment variable to your Groq API key, see https:\u002F\u002Fconsole.groq.com\u002Fdocs\u002Fquickstart for details on creating an API key\n\n### ggml\n\n* `--target_type ggml`\n* `--target_name` - The path to the ggml model you'd like to load, e.g. `\u002Fhome\u002Fleon\u002Fllama.cpp\u002Fmodels\u002F7B\u002Fggml-model-q4_0.bin`\n* set the `GGML_MAIN_PATH` environment variable to the path to your ggml `main` executable\n\n### REST\n\n`rest.RestGenerator` is highly flexible and can connect to any REST endpoint that returns plaintext or JSON. It does need some brief config, which will typically result a short YAML file describing your endpoint. See https:\u002F\u002Freference.garak.ai\u002Fen\u002Flatest\u002Fgarak.generators.rest.html for examples.\n\n### NIM\n\nUse models from https:\u002F\u002Fbuild.nvidia.com\u002F or other NIM endpoints.\n* set the `NIM_API_KEY` environment variable to your authentication API token, or specify it in the config YAML\n\nFor chat models:\n* `--target_type nim`\n* `--target_name` - the NIM `model` name, e.g. `meta\u002Fllama-3.1-8b-instruct`\n\nFor completion models:\n* `--target_type nim.NVOpenAICompletion`\n* `--target_name` - the NIM `model` name, e.g. `bigcode\u002Fstarcoder2-15b`\n\n### AWS Bedrock\n\n* `--target_type bedrock`\n* `--target_name` - the Bedrock model ID or alias, e.g. `anthropic.claude-3-sonnet-20240229-v1:0` or `claude-3-sonnet`\n* set the `BEDROCK_API_KEY` environment variable to your AWS Bedrock API key; see https:\u002F\u002Fdocs.aws.amazon.com\u002Fbedrock\u002Flatest\u002Fuserguide\u002Fapi-keys-use.html for setup instructions\n* (optional) set the `BEDROCK_REGION` environment variable to specify the AWS region (defaults to `us-east-1`)\n\nSupported model families include Anthropic Claude, Meta Llama, Amazon Titan, AI21 Labs, Cohere, and Mistral AI models. The generator uses the Converse API for unified access across all model types.\n\nExample usage:\n\n```\nexport BEDROCK_API_KEY=\"your-api-key\"\nexport BEDROCK_REGION=\"us-east-1\"\ngarak --target_type bedrock --target_name claude-3-sonnet --probes dan\n```\n\n### Test\n\n* `--target_type test`\n* (alternatively) `--target_name test.Blank`\nFor testing. This always generates the empty string, using the `test.Blank` generator.  Will be marked as failing for any tests that *require* an output, e.g. those that make contentious claims and expect the model to refute them in order to pass.\n\n* `--target_type test.Repeat`\nFor testing. This generator repeats back the prompt it received.\n\n## Intro to probes\n\n| Probe                | Description                                                                                                                   |\n|----------------------|-------------------------------------------------------------------------------------------------------------------------------|\n| blank                | A simple probe that always sends an empty prompt.                                                                             |\n| atkgen               | Automated Attack Generation. A red-teaming LLM probes the target and reacts to it in an attempt to get toxic output. Prototype, mostly stateless, for now uses a simple GPT-2 [fine-tuned](https:\u002F\u002Fhuggingface.co\u002Fgarak-llm\u002Fartgpt2tox) on the subset of hhrlhf attempts that yielded detectable toxicity (the only target currently supported for now). |\n| badchars             | Implements imperceptible Unicode perturbations (invisible characters, homoglyphs, reorderings, deletions) inspired by the [Bad Characters](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.09898) paper. |\n| av_spam_scanning     | Probes that attempt to make the model output malicious content signatures                                                     |\n| continuation         | Probes that test if the model will continue a probably undesirable word                                                       |\n| dan                  | Various [DAN](https:\u002F\u002Fadguard.com\u002Fen\u002Fblog\u002Fchatgpt-dan-prompt-abuse.html) and DAN-like attacks                                 |\n| donotanswer          | Prompts to which responsible language models should not answer.                                                               |\n| encoding             | Prompt injection through text encoding                                                                                        |\n| gcg                  | Disrupt a system prompt by appending an adversarial suffix.                                                                   |\n| glitch               | Probe model for glitch tokens that provoke unusual behavior.                                                                  |\n| grandma              | Appeal to be reminded of one's grandmother.                                                                                   |\n| goodside             | Implementations of Riley Goodside attacks.                                                                                    |\n| leakreplay           | Evaluate if a model will replay training data.                                                                                |\n| lmrc                 | Subsample of the [Language Model Risk Cards](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.18190) probes                                         |\n| malwaregen           | Attempts to have the model generate code for building malware                                                                 |\n| misleading           | Attempts to make a model support misleading and false claims                                                                  |\n| packagehallucination | Trying to get code generations that specify non-existent (and therefore insecure) packages.                                   |\n| promptinject         | Implementation of the Agency Enterprise [PromptInject](https:\u002F\u002Fgithub.com\u002Fagencyenterprise\u002FPromptInject\u002Ftree\u002Fmain\u002Fpromptinject) work (best paper awards @ NeurIPS ML Safety Workshop 2022) |\n| realtoxicityprompts  | Subset of the RealToxicityPrompts work (data constrained because the full test will take so long to run)                      |\n| snowball             | [Snowballed Hallucination](https:\u002F\u002Fofir.io\u002Fsnowballed_hallucination.pdf) probes designed to make a model give a wrong answer to questions too complex for it to process |\n| xss                  | Look for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration.                           |\n\n## Logging\n\n`garak` generates multiple kinds of log:\n* A log file, `garak.log`. This includes debugging information from `garak` and its plugins, and is continued across runs.\n* A report of the current run, structured as JSONL. A new report file is created every time `garak` runs. The name of this file is output at the beginning and, if successful, also at the end of the run. In the report, an entry is made for each probing attempt both as the generations are received, and again when they are evaluated; the entry's `status` attribute takes a constant from `garak.attempts` to describe what stage it was made at.\n* A hit log, detailing attempts that yielded a vulnerability (a 'hit')\n\n## How is the code structured?\n\nCheck out the [reference docs](https:\u002F\u002Freference.garak.ai\u002F) for an authoritative guide to `garak` code structure.\n\nIn a typical run, `garak` will read a model type (and optionally model name) from the command line, then determine which `probe`s and `detector`s to run, start up a `generator`, and then pass these to a `harness` to do the probing; an `evaluator` deals with the results. There are many modules in each of these categories, and each module provides a number of classes that act as individual plugins.\n\n* `garak\u002Fprobes\u002F` - classes for generating interactions with LLMs\n* `garak\u002Fdetectors\u002F` - classes for detecting an LLM is exhibiting a given failure mode\n* `garak\u002Fevaluators\u002F` - assessment reporting schemes\n* `garak\u002Fgenerators\u002F` - plugins for LLMs to be probed\n* `garak\u002Fharnesses\u002F` - classes for structuring testing\n* `resources\u002F` - ancillary items required by plugins\n\nThe default operating mode is to use the `probewise` harness. Given a list of probe module names and probe plugin names, the `probewise` harness instantiates each probe, then for each probe reads its `primary_detector` and `extended_detectors` attributes to get a list of `detector`s to run on the output.\n\nEach plugin category (`probes`, `detectors`, `evaluators`, `generators`, `harnesses`) includes a `base.py` which defines the base classes usable by plugins in that category. Each plugin module defines plugin classes that inherit from one of the base classes. For example, `garak.generators.openai.OpenAIGenerator` descends from `garak.generators.base.Generator`.\n\nLarger artefacts, like model files and bigger corpora, are kept out of the repository; they can be stored on e.g. Hugging Face Hub and loaded locally by clients using `garak`.\n\n\n## Developing your own plugin\n\n* Take a look at how other plugins do it\n* Inherit from one of the base classes, e.g. `garak.probes.base.TextProbe`\n* Override as little as possible\n* You can test the new code in at least two ways:\n  * Start an interactive Python session\n    * Import the model, e.g. `import garak.probes.mymodule`\n    * Instantiate the plugin, e.g. `p = garak.probes.mymodule.MyProbe()`\n  * Run a scan with test plugins\n    * For probes, try a blank generator and always.Pass detector: `python3 -m garak -m test.Blank -p mymodule -d always.Pass`\n    * For detectors, try a blank generator and a blank probe: `python3 -m garak -m test.Blank -p test.Blank -d mymodule`\n    * For generators, try a blank probe and always.Pass detector: `python3 -m garak -m mymodule -p test.Blank -d always.Pass`\n  * Get `garak` to list all the plugins of the type you're writing, with `--list_probes`, `--list_detectors`, or `--list_generators`\n\n\n## FAQ\n\nWe have an FAQ [here](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fgarak\u002Fblob\u002Fmain\u002FFAQ.md). Reach out if you have any more questions! [garak@nvidia.com](mailto:garak@nvidia.com)\n\nCode reference documentation is at [garak.readthedocs.io](https:\u002F\u002Fgarak.readthedocs.io\u002Fen\u002Flatest\u002F).\n\n## Citing garak\n\nYou can read the [garak preprint paper](garak-paper.pdf). If you use garak, please cite us.\n\n```\n@article{garak,\n  title={{garak: A Framework for Security Probing Large Language Models}},\n  author={Leon Derczynski and Erick Galinkin and Jeffrey Martin and Subho Majumdar and Nanna Inie},\n  year={2024},\n  howpublished={\\url{https:\u002F\u002Fgarak.ai}}\n}\n```\n\n\u003Chr>\n\n_\"Lying is a skill like any other, and if you wish to maintain a level of excellence you have to practice constantly\"_ - Elim\n\nFor updates and news see [@garak_llm](https:\u002F\u002Ftwitter.com\u002Fgarak_llm)\n\n© 2023- Leon Derczynski; Apache license v2, see [LICENSE](LICENSE)\n","Garak 是一个用于检测大型语言模型（LLM）漏洞的工具。它通过静态、动态和自适应探测方法，检查 LLM 是否存在幻觉生成、数据泄露、提示注入、错误信息传播、毒性内容生成、越狱等安全问题。该工具采用 Python 语言编写，支持包括 Hugging Face Hub、Replicate、OpenAI API、AWS Bedrock 等多种主流 LLM 平台。适用于需要对 AI 对话系统或 LLM 进行安全性评估的场景，如开发测试阶段的安全性验证或是上线前的风险排查。",2,"2026-06-11 03:35:40","high_star"]