[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-70768":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":16,"starSnapshotCount":16,"syncStatus":33,"lastSyncTime":34,"discoverSource":35},70768,"Chinese-Word-Vectors","Embedding\u002FChinese-Word-Vectors","Embedding","100+ Chinese Word Vectors 上百种预训练中文词向量 ","",null,"Python",12223,2326,279,60,0,6,70.6,"Apache License 2.0",false,"master",true,[24,25,26,27,28,29],"chinese","chinese-word-segmentation","embedding","embeddings","vectors-trained","word-embeddings","2026-06-12 04:00:57","# Chinese Word Vectors 中文词向量\n[中文](https:\u002F\u002Fgithub.com\u002FEmbedding\u002FChinese-Word-Vectors\u002Fblob\u002Fmaster\u002FREADME_zh.md)\n\nThis project provides 100+ Chinese Word Vectors (embeddings) trained with different **representations** (dense and sparse), **context features** (word, ngram, character, and more), and **corpora**. One can easily obtain pre-trained vectors with different properties and use them for downstream tasks. \n\nMoreover, we provide a Chinese analogical reasoning dataset **CA8** and an evaluation toolkit for users to evaluate the quality of their word vectors.\n\n## Reference\nPlease cite the paper, if using these embeddings and CA8 dataset.\n\nShen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du, \u003Ca href=\"http:\u002F\u002Faclweb.org\u002Fanthology\u002FP18-2023\">\u003Cem>Analogical Reasoning on Chinese Morphological and Semantic Relations\u003C\u002Fem>\u003C\u002Fa>, ACL 2018.\n\n```\n@InProceedings{P18-2023,\n  author =  \"Li, Shen\n    and Zhao, Zhe\n    and Hu, Renfen\n    and Li, Wensi\n    and Liu, Tao\n    and Du, Xiaoyong\",\n  title =   \"Analogical Reasoning on Chinese Morphological and Semantic Relations\",\n  booktitle =   \"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)\",\n  year =  \"2018\",\n  publisher =   \"Association for Computational Linguistics\",\n  pages =   \"138--143\",\n  location =  \"Melbourne, Australia\",\n  url =   \"http:\u002F\u002Faclweb.org\u002Fanthology\u002FP18-2023\"\n}\n```\n\n&nbsp;\n\nA detailed analysis of the relation between the intrinsic and extrinsic evaluations of Chinese word embeddings is shown in the paper:\n\nYuanyuan Qiu, Hongzheng Li, Shen Li, Yingdi Jiang, Renfen Hu, Lijiao Yang. \u003Ca href=\"http:\u002F\u002Fwww.cips-cl.org\u002Fstatic\u002Fanthology\u002FCCL-2018\u002FCCL-18-086.pdf\">\u003Cem>Revisiting Correlations between Intrinsic and Extrinsic Evaluations of Word Embeddings\u003C\u002Fem>\u003C\u002Fa>. Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, Cham, 2018. 209-221. (CCL & NLP-NABD 2018 Best Paper)\n\n```\n@incollection{qiu2018revisiting,\n  title={Revisiting Correlations between Intrinsic and Extrinsic Evaluations of Word Embeddings},\n  author={Qiu, Yuanyuan and Li, Hongzheng and Li, Shen and Jiang, Yingdi and Hu, Renfen and Yang, Lijiao},\n  booktitle={Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data},\n  pages={209--221},\n  year={2018},\n  publisher={Springer}\n}\n```\n\n## Format\nThe pre-trained vector files are in text format. Each line contains a word and its vector. Each value is separated by space. The first line records the meta information: the first number indicates the number of words in the file and the second indicates the dimension size. \n\nBesides dense word vectors (trained with SGNS), we also provide sparse vectors (trained with PPMI). They are in the same format with liblinear, where the number before \" : \" denotes dimension index and the number after the \" : \" denotes the value. \n\n## Pre-trained Chinese Word Vectors\n\n### Basic Settings\n\n\u003Ctable align=\"center\">\n  \u003Ctr align=\"center\">\n    \u003Ctd>\u003Cb>Window Size\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd>\u003Cb>Dynamic Window\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd>\u003Cb>Sub-sampling\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd>\u003Cb>Low-Frequency Word\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd>\u003Cb>Iteration\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd>\u003Cb>Negative Sampling\u003Csup>*\u003C\u002Fsup>\u003C\u002Fb>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>5\u003C\u002Ftd>\n    \u003Ctd>Yes\u003C\u002Ftd>\n    \u003Ctd>1e-5\u003C\u002Ftd>\n    \u003Ctd>10\u003C\u002Ftd>\n    \u003Ctd>5\u003C\u002Ftd>\n    \u003Ctd>5\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Csup>\\*\u003C\u002Fsup>Only for SGNS.\n\n### Various Domains\n\nChinese Word Vectors trained with different representations, context features, and corpora.\n\n\u003Ctable align=\"center\">\n    \u003Ctr align=\"center\">\n        \u003Ctd colspan=\"5\">\u003Cb>Word2vec \u002F Skip-Gram with Negative Sampling (SGNS)\u003C\u002Fb>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr align=\"center\">\n        \u003Ctd rowspan=\"2\">Corpus\u003C\u002Ftd>\n        \u003Ctd colspan=\"4\">Context Features\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Word\u003C\u002Ftd>\n      \u003Ctd>Word + Ngram\u003C\u002Ftd>\n      \u003Ctd>Word + Character\u003C\u002Ftd>\n      \u003Ctd>Word + Character + Ngram\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Baidu Encyclopedia 百度百科\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1Rn7LtTH0n7SHyHPfjRHbkg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1XEmP_0FkQwOjipCjI2OPEw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1eeCS7uD3e_qVN8rPwmXhAw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1IiIbQGJ_AooTj5s8aZYcvA\">300d\u003C\u002Fa> \u002F PWD: 5555\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Wikipedia_zh 中文维基百科\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F11hSZJN-NWBEvryIED6Donw?pwd=qfgv\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1RWcPWQEiCrwna7xmhI8ARg?pwd=jp7e\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1DKvgg0RgtqwyDPs1IbS0TQ?pwd=s22w\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1OTfYo_sQamCYwJLdp3KHnw?pwd=k6p9\">300d\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>People's Daily News 人民日报\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F19sqMz-JAhhxh3o6ecvQxQw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1upPkA8KJnxTZBfjuNDtaeQ\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1BvKk2QjbtQMch7EISppW2A\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F19Vso_k79FZb5OZCWQPAnFQ\">300d\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Sogou News 搜狗新闻\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1tUghuTno5yOvOx4LXA9-wg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F13yVrXeGYkxdGW3P6juiQmA\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1pUqyn7mnPcUmzxT64gGpSw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1svFOwFBKnnlsqrF1t99Lnw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Financial News 金融新闻\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1c8wmsqdrfUbQQ6j2Dx5NwQ?pwd=nakr\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1EXVpN8-vMr1-f2l4kZICLg?pwd=ki7t\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1EXVpN8-vMr1-f2l4kZICLg?pwd=ki7t\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F19JWtZL6U8P-XfE5LsTlftg?pwd=gbnb\">300d\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Zhihu_QA 知乎问答 \u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1VGOs0RH7DXE5vRrtw6boQA\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1OQ6fQLCgqT43WTwh5fh_lg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1_xogqF9kJT6tmQHSAYrYeg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1Fo27Lv_0nz8FXg-xbOz14Q\">300d\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Weibo 微博\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1zbuUJEEEpZRNHxZ7Gezzmw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F11PWBcvruXEDvKf2TiIXntg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F10bhJpaXMCUK02nHvRAttqA\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1FHl_bQkYucvVk-j2KG4dxA\">300d\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Literature 文学作品\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1ciq8iXtcrHpu3ir_VhK0zg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1Oa4CkPd8o2xd6LEAaa4gmg\">300d\u003C\u002Fa> \u002F PWD: z5b4\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1IG8IxNp2s7vVklz-vyZR9A\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1SEOKrJYS14HpqIaQT462kA\">300d\u003C\u002Fa> \u002F PWD: yenb\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Complete Library in Four Sections\u003Cbr \u002F>四库全书\u003Csup>*\u003C\u002Fsup>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1vPSeUsSiWYXEWAuokLR0qQ\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1sS9E7sclvS_UZcBgHN7xLQ\">300d\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>NAN\u003C\u002Ftd>\n      \u003Ctd>NAN\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Mixed-large 综合\u003Cbr>Baidu Netdisk \u002F Google Drive\u003C\u002Ftd>\n      \u003Ctd>\n        \u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1luy-GlTdqqvJ3j-A4FcIOw\">300d\u003C\u002Fa>\u003Cbr>\n        \u003Ca href=\"https:\u002F\u002Fdrive.google.com\u002Fopen?id=1Zh9ZCEu8_eSQ-qkYVQufQDNKPC4mtEKR\">300d\u003C\u002Fa>\n      \u003C\u002Ftd>\n      \u003Ctd>\n        \u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1oJol-GaRMk4-8Ejpzxo6Gw\">300d\u003C\u002Fa>\u003Cbr>\n        \u003Ca href=\"https:\u002F\u002Fdrive.google.com\u002Fopen?id=1WUU9LnoAjs--1E_WqcghLJ-Pp8bb38oS\">300d\u003C\u002Fa>\n      \u003C\u002Ftd>\n      \u003Ctd>\n        \u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1DjIGENlhRbsVyHW-caRePg\">300d\u003C\u002Fa>\u003Cbr>\n        \u003Ca href=\"https:\u002F\u002Fdrive.google.com\u002Fopen?id=1aVAK0Z2E5DkdIH6-JHbiWSL5dbAcz6c3\">300d\u003C\u002Fa>\n      \u003C\u002Ftd>\n      \u003Ctd>\n        \u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F14JP1gD7hcmsWdSpTvA3vKA\">300d\u003C\u002Fa>\u003Cbr>\n        \u003Ca href=\"https:\u002F\u002Fdrive.google.com\u002Fopen?id=1kSAl4_AOg3_6ayU7KRM0Nk66uGdSZdnk\">300d\u003C\u002Fa>\n      \u003C\u002Ftd>\n    \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Ctable align=\"center\">\n    \u003Ctr align=\"center\">\n        \u003Ctd colspan=\"5\">\u003Cb>Positive Pointwise Mutual Information (PPMI)\u003C\u002Fb>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr align=\"center\">\n        \u003Ctd rowspan=\"2\">Corpus\u003C\u002Ftd>\n        \u003Ctd colspan=\"4\">Context Features\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Word\u003C\u002Ftd>\n      \u003Ctd>Word + Ngram\u003C\u002Ftd>\n      \u003Ctd>Word + Character\u003C\u002Ftd>\n      \u003Ctd>Word + Character + Ngram\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Baidu Encyclopedia 百度百科\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1_itcjrQawCwcURa7WZLPOA\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1cEZzN1S2senwWSyHOnL7YQ\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1KcfFdyO0-kE9S9CwzIisfw\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1FXYM3CY161_4QMgiH8vasQ\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Wikipedia_zh 中文维基百科\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F172vD1NljxnbeubgXkuja4Q?pwd=k2hr\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1taIMttirPOw9Df51epIWBg?pwd=rmfh\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1-l9pdeUOwVzRVT4utvszfQ?pwd=ameb\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1VYI5GrKWR16gHvah38I3SQ?pwd=gzj8\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>People's Daily News 人民日报\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1NLr1K7aapU2sYBvzbVny5g\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1LJl3Br0ccGDHP0XX2k3pVw\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1GQQXGMn1AHh-BlifT0JD2g\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1Xm9Ec3O3rJ6ayrwVwonC7g\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Sogou News 搜狗新闻\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1ECA51CZLp9_JB_me7YZ9-Q\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1FO39ZYy1mStERf_b53Y_yQ\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1lLBFBk8nn3spFAvKY9IJ6A\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1f-dLQZlZo_-B5ZKcPIc6rw\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Financial News 金融新闻\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1yyJ7NZl-GabDJLbP-eYdCQ?pwd=9efk\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F17ZLOJpLXSQFxN0SZTITdIw?pwd=sjzy\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1rRGLUkA01kGceFDBOG9wlA?pwd=yve5\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1X-150CjeUPdQBq--Gr7w3A?pwd=qqc7\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Zhihu_QA 知乎问答 \u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1VaUP3YJC0IZKTbJ-1_8HZg\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1g39PKwT0kSmpneKOgXR5YQ\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1d8Bsuak0fyXxQOVUiNr-2w\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1D5fteBX0Vy4czEqpxXjlrQ\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Weibo 微博\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F15O2EbToOzjNSkzJwAOk_Ug\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F11Dqywn0hfMhysto7bZS1Dw\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1wY-7mfV6nwDj_tru6W9h4Q\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1DMW-MgLApbQnWwDd-pT_qw\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Literature 文学作品\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1HTHhlr8zvzhTwed7dO0sDg\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1jAuGJBxKqgapt__urGsBOQ\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F173AJfCoAV0ZA8Z31tKBdTA\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1dFCxke_Su3lLsuwZr7co3A\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Complete Library in Four Sections\u003Cbr \u002F>四库全书\u003Csup>*\u003C\u002Fsup>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1NJ1Gc99oE0-GV0QxBqy-qw\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1YGEgyXIbw0O4NtoM1ohjdA\">Sparse\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>NAN\u003C\u002Ftd>\n      \u003Ctd>NAN\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003C\u002Ftr>\n    \u003Ctr  align=\"center\">\n      \u003Ctd>Mixed-large 综合\u003C\u002Ftd>\n      \u003Ctd>Sparse\u003C\u002Ftd>\n      \u003Ctd>Sparse\u003C\u002Ftd>\n      \u003Ctd>Sparse\u003C\u002Ftd>\n      \u003Ctd>Sparse\u003C\u002Ftd>\n    \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Csup>\\*\u003C\u002Fsup>Character embeddings are provided, since most of Hanzi are words in the archaic Chinese.\n\n### Various Co-occurrence Information\n\nWe release word vectors upon different co-occurrence statistics. Target and context vectors are often called input and output vectors in some related papers. \n\nIn this part, one can obtain vectors of arbitrary linguistic units beyond word. For example, character vectors is in the context vectors of word-character.\n\nAll vectors are trained by SGNS on Baidu Encyclopedia.\n\n\u003Ctable align=\"center\">\n  \u003Ctr align=\"center\">\n    \u003Ctd>\u003Cb>Feature\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd>\u003Cb>Co-occurrence Type\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd>\u003Cb>Target Word Vectors\u003C\u002Fb>\u003C\u002Ftd>\n    \u003Ctd>\u003Cb>Context Word Vectors\u003C\u002Fb>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \n  \u003Ctr align=\"center\">\n  \t\u003Ctd rowspan=\"1\">Word\u003C\u002Ftd>\n    \u003Ctd>Word → Word\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1Rn7LtTH0n7SHyHPfjRHbkg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F18T6DRVmS_cZu5u64EbbESQ\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\n  \u003Ctr align=\"center\">\n    \u003Ctd rowspan=\"3\">Ngram\u003C\u002Ftd>\n    \u003Ctd>Word → Ngram (1-2)\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1XEmP_0FkQwOjipCjI2OPEw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F12asujjAaaqxNFYRNP-MThw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>Word → Ngram (1-3)\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1oUmbxsnSuXf2jU8Jxu7U8A\">300d\u003C\u002Fa>\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1ylg6FfFHa0kXbiVz8bIL8g\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>Ngram (1-2) → Ngram (1-2)\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1Za7DIGVhE6dMsTmxHb-izg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1oKI4Cs9eo7bg5mqfY1hdmg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \n  \u003Ctr align=\"center\">\n    \u003Ctd rowspan=\"3\">Character\u003C\u002Ftd>\n    \u003Ctd>Word → Character (1)\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1c9yiosHKNIZwRlLzD_F1ig\">300d\u003C\u002Fa>\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1KGZ_x8r-lq-AuElLCSVzvQ\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>Word → Character (1-2)\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1eeCS7uD3e_qVN8rPwmXhAw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1q0ItLzbn5Tfb3LhepRCeEA\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>Word → Character (1-4)\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1WNWAnba56Rqjmx-FAN_7_g\">300d\u003C\u002Fa>\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1hJKTAz6PwS7wmz9wQgmYeg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \n  \u003Ctr align=\"center\">\n  \t\u003Ctd rowspan=\"1\">Radical\u003C\u002Ftd>\n    \u003Ctd>Radical\u003C\u002Ftd>\n    \u003Ctd>300d\u003C\u002Ftd>\n \t  \u003Ctd>300d\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \n  \u003Ctr align=\"center\">\n    \u003Ctd rowspan=\"2\">Position\u003C\u002Ftd>\n    \u003Ctd>Word → Word (left\u002Fright)\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1JvjcrXFZPknT5H5Xw6KRVg\">300d\u003C\u002Fa>\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1m6K9CnIIS8FrQZdDuF6hPQ\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>Word → Word (distance)\u003C\u002Ftd>\n    \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1c29BDu4R1hyUX-sgvlHJnA\">300d\u003C\u002Fa>\u003C\u002Ftd>\n \t  \u003Ctd>\u003Ca href=\"https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1sMZHIc-7eU6gRalHwtBHZw\">300d\u003C\u002Fa>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \n  \u003Ctr align=\"center\">\n    \u003Ctd>Global\u003C\u002Ftd>\n    \u003Ctd>Word → Text\u003C\u002Ftd>\n    \u003Ctd>300d\u003C\u002Ftd>\n \t  \u003Ctd>300d\u003C\u002Ftd>\n  \u003C\u002Ftr>\n    \n  \u003Ctr align=\"center\">\n    \u003Ctd rowspan=\"2\">Syntactic Feature\u003C\u002Ftd>\n    \u003Ctd>Word → POS\u003C\u002Ftd>\n    \u003Ctd>300d\u003C\u002Ftd>\n \t  \u003Ctd>300d\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>Word → Dependency\u003C\u002Ftd>\n    \u003Ctd>300d\u003C\u002Ftd>\n \t  \u003Ctd>300d\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## Representations\nExisting word representation methods fall into one of the two classes, **dense** and **sparse** represnetations. SGNS model (a model in word2vec toolkit) and PPMI model are respectively typical methods of these two classes. SGNS model trains low-dimensional real (dense) vectors through a shallow neural network. It is also called neural embedding method. PPMI model is a sparse bag-of-feature representation weighted by positive-pointwise-mutual-information (PPMI) weighting scheme.\n\n## Context Features\nThree context features: **word**, **ngram**, and **character** are commonly used in the word embedding literature. Most word representation methods essentially exploit word-word co-occurrence statistics, namely using word as context feature **(word feature)**. Inspired by language modeling problem, we introduce ngram feature into the context. Both word-word and word-ngram co-occurrence statistics are used for training **(ngram feature)**. For Chinese, characters (Hanzi) often convey strong semantics. To this end, we consider using word-word and word-character co-occurrence statistics for learning word vectors. The length of character-level ngrams ranges from 1 to 4 **(character feature)**.\n\nBesides word, ngram, and character, there are other features which have substantial influence on properties of word vectors. For example, using entire text as context feature could introduce more topic information into word vectors; using dependency parse as context feature could add syntactic constraint to word vectors. 17 co-occurrence types are considered in this project.\n\n## Corpus\nWe made great efforts to collect corpus across various domains. All text data are preprocessed by removing html and xml tags. Only the plain text are kept and [HanLP(v_1.5.3)](https:\u002F\u002Fgithub.com\u002Fhankcs\u002FHanLP) is used for word segmentation. In addition, traditional Chinese characters are converted into simplified characters with [Open Chinese Convert (OpenCC)](https:\u002F\u002Fgithub.com\u002FBYVoid\u002FOpenCC). The detailed corpora information is listed as follows:\n\n\u003Ctable align=\"center\">\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>\u003Cb>Corpus\u003C\u002Fb>\u003C\u002Ftd>\n\t\t\u003Ctd>\u003Cb>Size\u003C\u002Fb>\u003C\u002Ftd>\n\t\t\u003Ctd>\u003Cb>Tokens\u003C\u002Fb>\u003C\u002Ftd>\n\t\t\u003Ctd>\u003Cb>Vocabulary Size\u003C\u002Fb>\u003C\u002Ftd>\n\t\t\u003Ctd>\u003Cb>Description\u003C\u002Fb>\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>Baidu Encyclopedia\u003Cbr \u002F>百度百科\u003C\u002Ftd>\n\t\t\u003Ctd>4.1G\u003C\u002Ftd>\n\t\t\u003Ctd>745M\u003C\u002Ftd>\n\t\t\u003Ctd>5422K\u003C\u002Ftd>\n\t\t\u003Ctd>Chinese Encyclopedia data from\u003Cbr \u002F>https:\u002F\u002Fbaike.baidu.com\u002F\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>Wikipedia_zh\u003Cbr \u002F>中文维基百科\u003C\u002Ftd>\n\t\t\u003Ctd>1.3G\u003C\u002Ftd>\n\t\t\u003Ctd>223M\u003C\u002Ftd>\n\t\t\u003Ctd>2129K\u003C\u002Ftd>\n\t\t\u003Ctd>Chinese Wikipedia data from\u003Cbr \u002F>https:\u002F\u002Fdumps.wikimedia.org\u002F\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>People's Daily News\u003Cbr \u002F>人民日报\u003C\u002Ftd>\n\t\t\u003Ctd>3.9G\u003C\u002Ftd>\n\t\t\u003Ctd>668M\u003C\u002Ftd>\n\t\t\u003Ctd>1664K\u003C\u002Ftd>\n\t\t\u003Ctd>News data from People's Daily(1946-2017)\u003Cbr \u002F>http:\u002F\u002Fdata.people.com.cn\u002F\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>Sogou News\u003Cbr \u002F>搜狗新闻\u003C\u002Ftd>\n\t\t\u003Ctd>3.7G\u003C\u002Ftd>\n\t\t\u003Ctd>649M\u003C\u002Ftd>\n\t\t\u003Ctd>1226K\u003C\u002Ftd>\n\t\t\u003Ctd>News data provided by Sogou labs\u003Cbr \u002F>http:\u002F\u002Fwww.sogou.com\u002Flabs\u002F\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>Financial News\u003Cbr \u002F>金融新闻\u003C\u002Ftd>\n    \u003Ctd>6.2G\u003C\u002Ftd>\n    \u003Ctd>1055M\u003C\u002Ftd>\n    \u003Ctd>2785K\u003C\u002Ftd>\n    \u003Ctd>Financial news collected from multiple news websites\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>Zhihu_QA\u003Cbr \u002F>知乎问答\u003C\u002Ftd>\n\t\t\u003Ctd>2.1G\u003C\u002Ftd>\n\t\t\u003Ctd>384M\u003C\u002Ftd>\n\t\t\u003Ctd>1117K\u003C\u002Ftd>\n\t\t\u003Ctd>Chinese QA data from\u003Cbr \u002F>https:\u002F\u002Fwww.zhihu.com\u002F\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>Weibo\u003Cbr \u002F>微博\u003C\u002Ftd>\n\t\t\u003Ctd>0.73G\u003C\u002Ftd>\n\t\t\u003Ctd>136M\u003C\u002Ftd>\n\t\t\u003Ctd>850K\u003C\u002Ftd>\n\t\t\u003Ctd>Chinese microblog data provided by NLPIR Lab\u003Cbr \u002F>http:\u002F\u002Fwww.nlpir.org\u002Fwordpress\u002Fdownload\u002Fweibo.7z\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>Literature\u003Cbr \u002F>文学作品\u003C\u002Ftd>\n\t\t\u003Ctd>0.93G\u003C\u002Ftd>\n\t\t\u003Ctd>177M\u003C\u002Ftd>\n\t\t\u003Ctd>702K\u003C\u002Ftd>\n\t\t\u003Ctd>8599 modern Chinese literature works\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n\t\u003Ctr align=\"center\">\n\t\t\u003Ctd>Mixed-large\u003Cbr \u002F>综合\u003C\u002Ftd>\n\t\t\u003Ctd>22.6G\u003C\u002Ftd>\n    \u003Ctd>4037M\u003C\u002Ftd>\n    \u003Ctd>10653K\u003C\u002Ftd>\n\t\t\u003Ctd>We build the large corpus by merging the above corpora.\u003C\u002Ftd>\n\t\u003C\u002Ftr>\n  \u003Ctr align=\"center\">\n    \u003Ctd>Complete Library in Four Sections\u003Cbr \u002F>四库全书\u003C\u002Ftd>\n    \u003Ctd>1.5G\u003C\u002Ftd>\n    \u003Ctd>714M\u003C\u002Ftd>\n    \u003Ctd>21.8K\u003C\u002Ftd>\n    \u003Ctd>The largest collection of texts in pre-modern China.\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\nAll words are concerned, including low frequency words.\n\n## Toolkits\nAll word vectors are trained by [ngram2vec](https:\u002F\u002Fgithub.com\u002Fzhezhaoa\u002Fngram2vec\u002F) toolkit. Ngram2vec toolkit is a superset of [word2vec](https:\u002F\u002Fgithub.com\u002Fsvn2github\u002Fword2vec) and [fasttext](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FfastText) toolkit, where arbitrary context features and models are supported.\n\n## Chinese Word Analogy Benchmarks\nThe quality of word vectors is often evaluated by analogy question tasks. In this project, two benchmarks are exploited for evaluation. The first is CA-translated, where most analogy questions are directly translated from English benchmark. Although CA-translated has been widely used in many Chinese word embedding papers, it only contains questions of three semantic questions and covers 134 Chinese words. In contrast, CA8 is specifically designed for Chinese language. It contains 17813 analogy questions and covers comprehensive morphological and semantic relations. The CA-translated, CA8, and their detailed descriptions are provided in [**testsets**](https:\u002F\u002Fgithub.com\u002FEmbedding\u002FChinese-Word-Vectors\u002Ftree\u002Fmaster\u002Ftestsets) folder.\n\n## Evaluation Toolkit\nWe present an evaluation toolkit in [**evaluation**](https:\u002F\u002Fgithub.com\u002FEmbedding\u002FChinese-Word-Vectors\u002Ftree\u002Fmaster\u002Fevaluation) folder. \n\nRun the following codes to evaluate dense vectors.\n```\n$ python ana_eval_dense.py -v \u003Cvector.txt> -a CA8\u002Fmorphological.txt\n$ python ana_eval_dense.py -v \u003Cvector.txt> -a CA8\u002Fsemantic.txt\n```\nRun the following codes to evaluate sparse vectors.\n```\n$ python ana_eval_sparse.py -v \u003Cvector.txt> -a CA8\u002Fmorphological.txt\n$ python ana_eval_sparse.py -v \u003Cvector.txt> -a CA8\u002Fsemantic.txt\n```\n","该项目提供了上百种预训练的中文词向量，涵盖了不同的表示方法（稠密和稀疏）、上下文特征（如词、n-gram、字符等）以及语料库。核心功能包括易于获取具有不同属性的预训练向量，并支持用户通过提供的中文类比推理数据集CA8及评估工具包来评测词向量的质量。技术上，这些向量使用了多种训练方法，如SGNS生成稠密向量、PPMI生成稀疏向量。适用于自然语言处理中的下游任务，比如文本分类、情感分析、机器翻译等场景，能够有效提升模型性能。",2,"2026-06-11 03:34:04","high_star"]