[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72213":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":16,"starSnapshotCount":16,"syncStatus":33,"lastSyncTime":34,"discoverSource":35},72213,"ai.robots.txt","ai-robots-txt\u002Fai.robots.txt","ai-robots-txt","A list of AI agents and robots to block.","https:\u002F\u002Fgithub.com\u002Fai-robots-txt\u002Fai.robots.txt\u002Freleases.atom",null,"Python",3927,162,57,11,0,5,17,59,15,28.64,"MIT License",false,"main",[26,27,28,29],"ai","crawlers","crawling","privacy","2026-06-12 02:03:00","# ai.robots.txt\n\n\u003Cimg src=\"\u002Fassets\u002Fimages\u002Fnoai-logo.png\" width=\"100\" \u002F>\n\nThis list contains AI-related crawlers of all types, regardless of purpose. We encourage you to contribute to and implement this list on your own site. See [information about the listed crawlers](.\u002Ftable-of-bot-metrics.md) and the [FAQ](https:\u002F\u002Fgithub.com\u002Fai-robots-txt\u002Fai.robots.txt\u002Fblob\u002Fmain\u002FFAQ.md).\n\nA number of these crawlers have been sourced from [Dark Visitors](https:\u002F\u002Fdarkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers.\n\nIf you'd like to add information about a crawler to the list, please make a pull request with the bot name added to `robots.txt`, `ai.txt`, and any relevant details in `table-of-bot-metrics.md` to help people understand what's crawling.\n\n## Usage\n\nThis repository provides the following files:\n- `robots.txt`\n- `.htaccess`\n- `nginx-block-ai-bots.conf`\n- `Caddyfile`\n- `haproxy-block-ai-bots.txt`\n- `lighttpd-block-ai-bots.conf`\n\n`robots.txt` implements the Robots Exclusion Protocol ([RFC 9309](https:\u002F\u002Fwww.rfc-editor.org\u002Frfc\u002Frfc9309.html)).\n\n`.htaccess` may be used to configure web servers such as [Apache httpd](https:\u002F\u002Fhttpd.apache.org\u002F) to return an error page when one of the listed AI crawlers sends a request to the web server.\nNote that, as stated in the [httpd documentation](https:\u002F\u002Fhttpd.apache.org\u002Fdocs\u002Fcurrent\u002Fhowto\u002Fhtaccess.html), more performant methods than an `.htaccess` file exist.\n\n`nginx-block-ai-bots.conf` implements a Nginx configuration snippet that can be included in any virtual host `server {}` block via the `include` directive.\n\n`Caddyfile` includes a Header Regex matcher group you can copy or import into your Caddyfile, the rejection can then be handled as followed `abort @aibots`\n\n`haproxy-block-ai-bots.txt` may be used to configure HAProxy to block AI bots. To implement it;\n1. Add the file to the config directory of HAProxy\n2. Add the following lines in the `frontend` section;\n   ```\n   acl ai_robot hdr_sub(user-agent) -i -f \u002Fetc\u002Fhaproxy\u002Fhaproxy-block-ai-bots.txt\n   http-request deny if ai_robot\n   ```\n   (Note that the path of the `haproxy-block-ai-bots.txt` may be different in your environment.)\n\n`lighttpd-block-ai-bots.conf` can be included with `include \"fragments\u002Flighttpd-block-ai-bots.conf\"` in your lighttpd configuration either globally or in any conditional section.\n\n[Bing uses the data it crawls for AI and training, you may opt out by adding a `meta` tag to the `head` of your site.](.\u002Fdocs\u002Fadditional-steps\u002Fbing.md)\n\n### Related\n\n- [Robots.txt Traefik plugin](https:\u002F\u002Fplugins.traefik.io\u002Fplugins\u002F681b2f3fba3486128fc34fae\u002Frobots-txt-plugin):\nmiddleware plugin for [Traefik](https:\u002F\u002Ftraefik.io\u002Ftraefik\u002F) to automatically add rules of [robots.txt](.\u002Frobots.txt)\nfile on-the-fly.\n\n- Alternatively you can [manually configure Traefik](.\u002Fdocs\u002Ftraefik-manual-setup.md) to centrally serve a static `robots.txt`.\n## Contributing\n\nA note about contributing: updates should be added\u002Fmade to `robots.json`. A GitHub action will then generate the updated `robots.txt`, `table-of-bot-metrics.md`, `.htaccess` and `nginx-block-ai-bots.conf`.\n\nYou can run the tests by [installing](https:\u002F\u002Fwww.python.org\u002Fabout\u002Fgettingstarted\u002F) Python 3, installing the dependencies:\n```console\npip install -r requirements.txt\n```\nand then issuing:\n```console\ncode\u002Ftests.py\n```\n\nThe `.editorconfig` file provides standard editor options for this project. See [EditorConfig](https:\u002F\u002Feditorconfig.org\u002F) for more information.\n\n## Releasing\n\nAdmins may ship a new release `v1.n` (where `n` increments the minor version of the current release) as follows:\n\n* Navigate to the [new release page](https:\u002F\u002Fgithub.com\u002Fai-robots-txt\u002Fai.robots.txt\u002Freleases\u002Fnew) on GitHub.\n* Click `Select tag`, choose `Create new tag`, enter `v1.n` in the pop-up, and click `Create`.\n* Enter a suitable release title (e.g. `v1.n: adds user-agent1, user-agent2`).\n* Click `Generate release notes`.\n* Click `Publish release`.\n\nA GitHub action will then add the asset `robots.txt` to the release. That's it.\n\n## Subscribe to updates\n\nYou can subscribe to list updates via RSS\u002FAtom with the releases feed:\n\n```\nhttps:\u002F\u002Fgithub.com\u002Fai-robots-txt\u002Fai.robots.txt\u002Freleases.atom\n```\n\nYou can subscribe with [Feedly](https:\u002F\u002Ffeedly.com\u002Fi\u002Fsubscription\u002Ffeed\u002Fhttps:\u002F\u002Fgithub.com\u002Fai-robots-txt\u002Fai.robots.txt\u002Freleases.atom), [Inoreader](https:\u002F\u002Fwww.inoreader.com\u002F?add_feed=https:\u002F\u002Fgithub.com\u002Fai-robots-txt\u002Fai.robots.txt\u002Freleases.atom), [The Old Reader](https:\u002F\u002Ftheoldreader.com\u002Ffeeds\u002Fsubscribe?url=https:\u002F\u002Fgithub.com\u002Fai-robots-txt\u002Fai.robots.txt\u002Freleases.atom), [Feedbin](https:\u002F\u002Ffeedbin.me\u002F?subscribe=https:\u002F\u002Fgithub.com\u002Fai-robots-txt\u002Fai.robots.txt\u002Freleases.atom), or any other reader app.\n\nAlternatively, you can also subscribe to new releases with your GitHub account by clicking the ⬇️ on \"Watch\" button at the top of this page, clicking \"Custom\" and selecting \"Releases\".\n\n## License content with RSL\n\nIt is also possible to license your content to AI companies in `robots.txt` using\nthe [Really Simple Licensing](https:\u002F\u002Frslstandard.org) standard, with an option of\ncollective bargaining. A [plugin](https:\u002F\u002Fgithub.com\u002FJameswlepage\u002Frsl-wp) currently\nimplements RSL as well as payment processing for WordPress sites.\n\n## Report abusive crawlers\n\nIf you use [Cloudflare's hard block](https:\u002F\u002Fblog.cloudflare.com\u002Fdeclaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) alongside this list, you can report abusive crawlers that don't respect `robots.txt` [here](https:\u002F\u002Fdocs.google.com\u002Fforms\u002Fd\u002Fe\u002F1FAIpQLScbUZ2vlNSdcsb8LyTeSF7uLzQI96s0BKGoJ6wQ6ocUFNOKEg\u002Fviewform).\nBut even if you don't use Cloudflare's hard block, their list of [verified bots](https:\u002F\u002Fradar.cloudflare.com\u002Ftraffic\u002Fverified-bots) may come in handy.\n## Additional resources\n\n- [Blocking Bots with Nginx](https:\u002F\u002Frknight.me\u002Fblog\u002Fblocking-bots-with-nginx\u002F) by Robb Knight\n- [Blockin' bots.](https:\u002F\u002Fethanmarcotte.com\u002Fwrote\u002Fblockin-bots\u002F) by Ethan Marcotte\n- [Blocking Bots With 11ty And Apache](https:\u002F\u002Fflamedfury.com\u002Fposts\u002Fblocking-bots-with-11ty-and-apache\u002F) by fLaMEd fury\n- [Blockin' bots on Netlify](https:\u002F\u002Fwww.jeremiak.com\u002Fblog\u002Fblock-bots-netlify-edge-functions\u002F) by Jeremia Kimelman\n- [Blocking AI web crawlers](https:\u002F\u002Funderlap.org\u002Fblocking-ai-web-crawlers) by Glyn Normington\n- [Block AI Bots from Crawling Websites Using Robots.txt](https:\u002F\u002Foriginality.ai\u002Fai-bot-blocking) by Jonathan Gillham, Originality.AI\n","ai.robots.txt 项目提供了一个列表，用于阻挡各类AI相关的爬虫。该项目的核心功能是通过维护一个包含多种AI爬虫的`robots.txt`文件来帮助网站管理员更好地控制这些爬虫的行为，支持Apache、Nginx、Caddy、HAProxy和lighttpd等多种Web服务器配置。它还提供了详细的文档说明如何针对不同类型的Web服务器实施相应的防护措施。适合那些希望减少不必要或恶意AI爬虫访问量、保护用户隐私以及优化站点性能的场景使用。",2,"2026-06-11 03:40:51","high_star"]