[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-10783":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":40,"readmeContent":41,"aiSummary":42,"trendingCount":16,"starSnapshotCount":16,"syncStatus":18,"lastSyncTime":43,"discoverSource":44},10783,"chatgpt-comparison-detection","Hello-SimpleAI\u002Fchatgpt-comparison-detection","Hello-SimpleAI","Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥","https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.07597",null,"Python",1357,124,27,25,0,1,2,9,4,55.19,false,"main",true,[26,27,28,29,30,31,32,33,34,35,36,37,38,39],"ai","chatbot","chatgpt","dataset","deep-learning","gpt-3","gpt2","gpt3","machine-learning","ml","nlp","openai","python","text-classification","2026-06-12 04:00:52","# ChatGPT-Comparison-Detection Project 🔬\n\n![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLanguages-%20English%2C%20Chinese-brightgreen) \n![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FChatGPT-Corpus%2C%20Detector-blue)\n\nOfficial repository of paper [\"How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection\"](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.07597). Please star, watch, and fork our repo for the active updates!\n\nSee also→([📢 Feedback Space for Detectors](https:\u002F\u002Fgithub.com\u002FHello-SimpleAI\u002Fchatgpt-comparison-detection\u002Fdiscussions\u002F2) please feel free to leave your feedback here! 请留下您宝贵的意见！)\n\n\n\n\u003Cimg width=\"600\" alt=\"image\" src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F37113676\u002F212355768-5ef7a26a-7cc5-4c38-91dc-2ee249ec49d5.png\">\n\n---\n### Human ChatGPT Comparison Corpus (HC3) \u002F 人类-ChatGPT 问答对比语料集\nYes, we propose the first **Human vs. ChatGPT** comparison corpus, named **HC3**.\n\n我们提出了第一个 **Human vs. ChatGPT** 对比语料, 叫做 **HC3**.\n\n\u003Cimg width=\"520\" alt=\"image\" src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F37113676\u002F213218672-e92b7036-a602-48c8-b70d-50ee1673bac8.png\">\n\nThe first version of the HC3 datasets are now available on 🤗 Huggingface Datasets:\n- [HC3-English](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHello-SimpleAI\u002FHC3)\n- [HC3-Chinese](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHello-SimpleAI\u002FHC3-Chinese)\n\n\n在中文社区，HC3 数据集也已在 ModelScope 上可用:\n- [HC3-English](https:\u002F\u002Fwww.modelscope.cn\u002Fdatasets\u002Fsimpleai\u002FHC3)\n- [HC3-Chinese](https:\u002F\u002Fwww.modelscope.cn\u002Fdatasets\u002Fsimpleai\u002FHC3-Chinese)\n\n\n> Train\u002FTest splits & filtered versions of the paper, ref to Google Drive links in [HC3\u002FREADME.md](HC3\u002FREADME.md).\n\n### Dataset Copyright\n\nIf the source datasets used in this corpus has a specific license which is stricter than CC-BY-SA, our products follow the same.\nIf not, they follow CC-BY-SA license.\n\n| English Split       | Source | Source License | Note |\n|----------|-------------|--------|-------------|\n| reddit_eli5 | [ELI5](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FELI5)   | BSD License    |     |\n| open_qa  | [WikiQA](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fdownload\u002Fdetails.aspx?id=52419)  | [PWC Custom](https:\u002F\u002Fpaperswithcode.com\u002Fdatasets\u002Flicense)   |      |\n| wiki_csai   | Wikipedia | CC-BY-SA |   | [Wiki FAQ](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FWikipedia:FAQ\u002FCopyright) |\n| medicine    | [Medical Dialog](https:\u002F\u002Fgithub.com\u002FUCSD-AI4H\u002FMedical-Dialogue-System) | Unknown|  [Asking](https:\u002F\u002Fgithub.com\u002FUCSD-AI4H\u002FMedical-Dialogue-System\u002Fissues\u002F10)|\n| finance     | [FiQA](https:\u002F\u002Fpaperswithcode.com\u002Fdataset\u002Ffiqa-1) | Unknown |  Asking by 📧  |\n\n| Chinese Split       | Source | Source License  | Note |\n|----------|-------------|-----------|-------------|\n| open_qa  | [WebTextQA & BaikeQA](https:\u002F\u002Fgithub.com\u002Fbrightmart\u002Fnlp_chinese_corpus) | MIT license |  |  |\n| baike     | Baidu Baike  | None   |    |   |\n| nlpcc_dbqa  | [NLPCC-DBQA](https:\u002F\u002Fgithub.com\u002Fmsra-nlc\u002FChineseDBQA) | Unknown |   [Asking](https:\u002F\u002Fgithub.com\u002FUCSD-AI4H\u002FMedical-Dialogue-System\u002Fissues\u002F10) |\n| medicine    | [Chinese Medical Dialogue](https:\u002F\u002Ftianchi.aliyun.com\u002Fdataset\u002F90163) |  CC-BY-NC 4.0 | \n| finance     | [FinanceZhidao](https:\u002F\u002Fwww.heywhale.com\u002Fmw\u002Fdataset\u002F5e9588f8e7ec38002d0331b1\u002Fcontent) | CC-BY 4.0 |  |\n| psychology  | [On Baidu AI Studio](https:\u002F\u002Faistudio.baidu.com\u002Faistudio\u002Fdatasetdetail\u002F38489) | CC0  | |\n|law          | [LegalQA](https:\u002F\u002Fgithub.com\u002Fsiatnlp\u002FLegalQA) | Unknown | [Asking](https:\u002F\u002Fgithub.com\u002Fsiatnlp\u002FLegalQA\u002Fissues\u002F2) |\n\n\n---\n\n### ChatGPT detectors \u002F 内容检测器\n![image](https:\u002F\u002Fuser-images.githubusercontent.com\u002F37113676\u002F211677236-d7c028f5-b9a5-4d88-baee-8b86dc942ff7.png)\n(Hosted on 🤗 Hugging Face Spaces)\n\n\nWe provide three kinds of detectors, all in Bilingual \u002F 我们提供了三个版本的检测器，且都支持中英文:\n- [QA version \u002F 问答版](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FHello-SimpleAI\u002Fchatgpt-detector-qa): detect whether an **answer** is generated by ChatGPT for certain **question**, using PLM-based classifiers \u002F 判断某个**问题的回答**是否由ChatGPT生成，使用基于PTM的分类器来开发;\n- [Sinlge-text version \u002F 独立文本版](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FHello-SimpleAI\u002Fchatgpt-detector-single): detect whether a piece of text is ChatGPT generated, using PLM-based classifiers \u002F 判断**单条文本**是否由ChatGPT生成，使用基于PTM的分类器来开发;\n- [Linguistic version \u002F 语言学版](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FHello-SimpleAI\u002Fchatgpt-detector-ling): detect whether a piece of text is ChatGPT generated, using linguistic features \u002F 判断**单条文本**是否由ChatGPT生成，使用基于语言学特征的模型来开发;\n\n\n在 modelscope 中文社区平台，三个版本的检测器也都可用:\n- [QA version \u002F 问答版](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002Fsimpleai\u002Fchatgpt-detector-qa)\n- [Sinlge-text version \u002F 独立文本版](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002Fsimpleai\u002Fchatgpt-detector-single)\n- [Linguistic version \u002F 语言学版](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002Fsimpleai\u002Fchatgpt-detector-ling)\n\n\nThe model weights are all available at 🤗 Hugging Face Models:\n\n| Model Checkpoints              | Comment      |\n|-----------------------|------------|\n|[chatgpt-detector-roberta](https:\u002F\u002Fhuggingface.co\u002FHello-SimpleAI\u002Fchatgpt-detector-roberta)|To detect a single piece of text|\n|[chatgpt-qa-detector-roberta](https:\u002F\u002Fhuggingface.co\u002FHello-SimpleAI\u002Fchatgpt-qa-detector-roberta)|To detect a question-answer pair|\n|[chatgpt-detector-roberta-chinese](https:\u002F\u002Fhuggingface.co\u002FHello-SimpleAI\u002Fchatgpt-detector-roberta-chinese)|检测单条文本，中文版|\n|[chatgpt-qa-detector-roberta-chinese](https:\u002F\u002Fhuggingface.co\u002FHello-SimpleAI\u002Fchatgpt-qa-detector-roberta-chinese)|检测一对QA文本，中文版|\n\nThe English models are based on [roberta-base](https:\u002F\u002Fhuggingface.co\u002Froberta-base).\nThe Chinese models are based on [hfl\u002Fchinese-roberta-wwm-ext](https:\u002F\u002Fhuggingface.co\u002Fhfl\u002Fchinese-roberta-wwm-ext).\n\n\n---\n\n### Important Dates \u002F 重要节点:\n\n| Events                | Dates      |\n|-----------------------|------------|\n| Project Launch \u002F 项目启动        | 2022-12-09 ✅ |\n| Comparison Data Collection \u002F 对比数据收集        | 2022-12-11 to Now 🏎️|\n| Release ChatGPT Detector (Demo) \u002F 检测器 Demo 发布 | 2023-01-11 ✅|\n| Models Release \u002F 模型开源 | 2023-01-18 ✅|\n| Comparison Corpus Release \u002F 语料集开源 | 2023-01-18 ✅|\n| Research Paper \u002F 研究论文发布 | 2023-01-19 ✅|\n|...|...|\n\n\n\n---\n\n### Citation\n\nCheckout this paper [arxiv: 2301.07597](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.07597)\n\n```\n@article{guo-etal-2023-hc3,\n    title = \"How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection\",\n    author = \"Guo, Biyang  and\n      Zhang, Xin  and\n      Wang, Ziyuan  and\n      Jiang, Minqi  and\n      Nie, Jinran  and\n      Ding, Yuxuan  and\n      Yue, Jianwei  and\n      Wu, Yupeng\",\n    journal={arXiv preprint arxiv:2301.07597}\n    year = \"2023\",\n}\n```\n\n\n\n---\n### Our Story... \u002F 背景故事\n\nOn December 9, 2022, which is 10 days after the launch of [ChatGPT](https:\u002F\u002Fopenai.com\u002Fblog\u002Fchatgpt\u002F), we started this project, for two purposes: \n1. To create some **open-source models** for efficiently detecting ChatGPT-generated content; \n2. To collect a valuable **human-ChatGPT comparison Q&A corpus**, to facilitate releated research.\n\n2022 年 12 月 9 日，也就是 [ChatGPT](https:\u002F\u002Fopenai.com\u002Fblog\u002Fchatgpt\u002F) 推出的第 10 天，我们开始了这个项目，为了两个目的：\n1. 做出一些**开源**模型工具来高效检测 ChatGPT 生成的内容；\n2. 收集一批有价值的**人类和 ChatGPT 对比**的中英双语问答语料，来助力相关学术研究。\n\nWelcome to follow our project! We have released a preview of our ChatGPT detectors, and the **models, dataset will be open-sourced** in about a week. We look forward to receiving feedback from the community to help improve the models and make contributions to **open** academic research together:)\u003Cbr>\n欢迎关注我们项目，我们目前已经发布ChatGPT检测器预览版，并将于约**一周内发布开源模型、数据集**。期待得到广大群众的反馈，来帮助我们改进模型，为**开放**的学术研究一起做贡献！\n\n### About Us \u002F 关于我们\n\nWe are a group of insignificant researchers (in the shadow of ChatGPT) hoping to do some significant work for the community. The team for this projects consists of PhD students and engineers from 6 universities\u002Fcompanies.\u003Cbr>\n我们是一群（在 ChatGPT 的阴影下）渺小的研究人员，但希望为社区做一些有意义的事。这个项目的团队由来自6所大学\u002F公司的博士生和工程师组成。\n\n|   |   |   |   |\n|:-:|:-:|:-:|:-:|\n| [Biyang Guo](https:\u002F\u002Fgithub.com\u002Fbeyondguo) | [Minqi Jiang](https:\u002F\u002Fgithub.com\u002FMinqi824) | [Ziyuan Wang](https:\u002F\u002Fgithub.com\u002FSUFEHeisenberg) | [Xin Zhang](https:\u002F\u002Fgithub.com\u002Fizhx) |\n|\u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F37113676?s=64&v=4\" alt=\"\" width=\"40\"\u002F>|\u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F39890732?s=64&v=4\" alt=\"\" width=\"40\"\u002F>|\u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F44188955?s=64&v=4\" alt=\"\" width=\"40\"\u002F>|\u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F26690193?s=64&v=4\" alt=\"\" width=\"40\"\u002F>|\n| [Jinran Nie](https:\u002F\u002Fgithub.com\u002FNJRBarry) | [Yuxuan Ding](https:\u002F\u002Fgithub.com\u002Fyxding95) | [Jianwei Yue](https:\u002F\u002Fgithub.com\u002FTurquoiseA) | [Yupeng Wu](https:\u002F\u002Fgithub.com\u002FrealRoc) |\n|\u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F27188419?s=64&v=4\" alt=\"\" width=\"40\"\u002F>|\u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F16249556?s=70&v=4\" alt=\"\" width=\"40\"\u002F>|  \u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F23006855?s=64&v=4\" alt=\"\" width=\"40\"\u002F> | \u003Cimg src=\"https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F44936809?s=64&v=4\" alt=\"\" width=\"40\"\u002F>  |\n\n\n\n\n\n\n\n\n\n","该项目旨在提供人类与ChatGPT对话对比的语料库（HC3）及检测工具，用于评估和区分两者生成内容。其核心功能包括构建了首个涵盖中英文的人类-ChatGPT对比数据集，并开发了基于深度学习的文本分类模型来识别文本是由人类还是AI生成。技术上，项目采用了Python语言实现，并利用了自然语言处理、机器学习等领域的最新研究成果。适用于需要对文本来源进行验证的应用场景，如教育、媒体审查等领域，帮助用户更好地理解AI生成内容的特点并确保信息的真实性。","2026-06-11 03:30:08","top_topic"]