[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-10766":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},10766,"Rasa_NLU_Chi","crownpku\u002FRasa_NLU_Chi","crownpku","Turn Chinese natural language into structured data 中文自然语言理解","",null,"Python",1533,420,69,77,0,1,20.87,"Apache License 2.0",false,"master",true,[24,25,26],"chatbot","chinese","natural-language","2026-06-12 02:02:26","\n\n# Rasa NLU for Chinese, a fork from RasaHQ\u002Frasa_nlu.\n\n## Please refer to newest instructions at [official Rasa NLU document](https:\u002F\u002Fnlu.rasa.com\u002F)\n\n## [中文Blog](http:\u002F\u002Fwww.crownpku.com\u002F2017\u002F07\u002F27\u002F%E7%94%A8Rasa_NLU%E6%9E%84%E5%BB%BA%E8%87%AA%E5%B7%B1%E7%9A%84%E4%B8%AD%E6%96%87NLU%E7%B3%BB%E7%BB%9F.html)\n\n![](http:\u002F\u002Fwww.crownpku.com\u002Fimages\u002F201707\u002F5.jpg)\n![](http:\u002F\u002Fwww.crownpku.com\u002Fimages\u002F201707\u002F4.jpg)\n\n\n\n### Files you should have:\n\n* data\u002Ftotal_word_feature_extractor_zh.dat\n\nTrained from Chinese corpus by MITIE wordrep tools (takes 2-3 days for training)\n\nFor training, please build the [MITIE Wordrep Tool](https:\u002F\u002Fgithub.com\u002Fmit-nlp\u002FMITIE\u002Ftree\u002Fmaster\u002Ftools\u002Fwordrep). Note that Chinese corpus should be tokenized first before feeding into the tool for training. Close-domain corpus that best matches user case works best.\n\nA trained model from Chinese Wikipedia Dump and Baidu Baike can be downloaded from [中文Blog](http:\u002F\u002Fwww.crownpku.com\u002F2017\u002F07\u002F27\u002F%E7%94%A8Rasa_NLU%E6%9E%84%E5%BB%BA%E8%87%AA%E5%B7%B1%E7%9A%84%E4%B8%AD%E6%96%87NLU%E7%B3%BB%E7%BB%9F.html).\n\n\n* data\u002Fexamples\u002Frasa\u002Fdemo-rasa_zh.json\n\nShould add as much examples as possible.\n\n### Usage:\n\n1. Clone this project, and run\n```\npython setup.py install\n```\n\n2. Modify configuration. \n\n   Currently for Chinese we have two pipelines:\n\n   Use MITIE+Jieba (sample_configs\u002Fconfig_jieba_mitie.yml):\n```yaml\nlanguage: \"zh\"\n\npipeline:\n- name: \"nlp_mitie\"\n  model: \"data\u002Ftotal_word_feature_extractor_zh.dat\"\n- name: \"tokenizer_jieba\"\n- name: \"ner_mitie\"\n- name: \"ner_synonyms\"\n- name: \"intent_entity_featurizer_regex\"\n- name: \"intent_classifier_mitie\"\n```\n\n   RECOMMENDED: Use MITIE+Jieba+sklearn (sample_configs\u002Fconfig_jieba_mitie_sklearn.yml):\n```yaml\nlanguage: \"zh\"\n\npipeline:\n- name: \"nlp_mitie\"\n  model: \"data\u002Ftotal_word_feature_extractor_zh.dat\"\n- name: \"tokenizer_jieba\"\n- name: \"ner_mitie\"\n- name: \"ner_synonyms\"\n- name: \"intent_entity_featurizer_regex\"\n- name: \"intent_featurizer_mitie\"\n- name: \"intent_classifier_sklearn\"\n```\n\n3. (Optional) Use Jieba User Defined Dictionary or Switch Jieba Default Dictionoary:\n\n   You can put in **file path** or **directory path** as the \"user_dicts\" value. (sample_configs\u002Fconfig_jieba_mitie_sklearn_plus_dict_path.yml)\n\n```yaml\nlanguage: \"zh\"\n\npipeline:\n- name: \"nlp_mitie\"\n  model: \"data\u002Ftotal_word_feature_extractor_zh.dat\"\n- name: \"tokenizer_jieba\"\n  default_dict: \".\u002Fdefault_dict.big\"\n  user_dicts: \".\u002Fjieba_userdict\"\n#  user_dicts: \".\u002Fjieba_userdict\u002Fjieba_userdict.txt\"\n- name: \"ner_mitie\"\n- name: \"ner_synonyms\"\n- name: \"intent_entity_featurizer_regex\"\n- name: \"intent_featurizer_mitie\"\n- name: \"intent_classifier_sklearn\"\n```\n\n4. Train model by running:\n\n   If you specify your project name in configure file, this will save your model at \u002Fmodels\u002Fyour_project_name. \n\n   Otherwise, your model will be saved at \u002Fmodels\u002Fdefault\n\n```\npython -m rasa_nlu.train -c sample_configs\u002Fconfig_jieba_mitie_sklearn.yml --data data\u002Fexamples\u002Frasa\u002Fdemo-rasa_zh.json --path models\n```\n\n\n5. Run the rasa_nlu server:\n\n```\npython -m rasa_nlu.server -c sample_configs\u002Fconfig_jieba_mitie_sklearn.yml --path models\n```\n\n\n6. Open a new terminal and now you can curl results from the server, for example:\n\n```\n$ curl -XPOST localhost:5000\u002Fparse -d '{\"q\":\"我发烧了该吃什么药？\", \"project\": \"rasa_nlu_test\", \"model\": \"model_20170921-170911\"}' | python -mjson.tool\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100   652    0   552  100   100    157     28  0:00:03  0:00:03 --:--:--   157\n{\n    \"entities\": [\n        {\n            \"end\": 3,\n            \"entity\": \"disease\",\n            \"extractor\": \"ner_mitie\",\n            \"start\": 1,\n            \"value\": \"发烧\"\n        }\n    ],\n    \"intent\": {\n        \"confidence\": 0.5397186422631861,\n        \"name\": \"medical\"\n    },\n    \"intent_ranking\": [\n        {\n            \"confidence\": 0.5397186422631861,\n            \"name\": \"medical\"\n        },\n        {\n            \"confidence\": 0.16206323981749196,\n            \"name\": \"restaurant_search\"\n        },\n        {\n            \"confidence\": 0.1212448457737397,\n            \"name\": \"affirm\"\n        },\n        {\n            \"confidence\": 0.10333600028547868,\n            \"name\": \"goodbye\"\n        },\n        {\n            \"confidence\": 0.07363727186010374,\n            \"name\": \"greet\"\n        }\n    ],\n    \"text\": \"我发烧了该吃什么药？\"\n}\n```\n","Rasa_NLU_Chi 是一个专注于中文自然语言理解的项目，旨在将中文自然语言转换为结构化数据。该项目基于 Rasa NLU 进行了扩展和优化，特别支持中文处理，采用了 MITIE 与 Jieba 结合的方式进行实体识别和意图分类，并提供了使用 sklearn 作为意图分类器的推荐配置，以提高模型的准确性和泛化能力。项目还允许用户自定义词典或切换默认词典，从而更好地适应特定领域的语料特性。适合需要构建中文聊天机器人、客户服务助手等自然语言处理应用场景的企业和个人开发者使用。",2,"2026-06-11 03:30:03","top_topic"]