[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-6359":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},6359,"GloVe","stanfordnlp\u002FGloVe","stanfordnlp","Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings","",null,"C",7221,1545,226,81,0,1,7,3,68.77,"Apache License 2.0",false,"master",true,[],"2026-06-12 04:00:28","## GloVe: Global Vectors for Word Representation\n\n\n| nearest neighbors of \u003Cbr\u002F> \u003Cem>frog\u003C\u002Fem> | Litoria             |  Leptodactylidae | Rana | Eleutherodactylus |\n| --- | ------------------------------- | ------------------- | ---------------- | ------------------- |\n| Pictures | \u003Cimg src=\"https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fimages\u002Flitoria.jpg\">\u003C\u002Fimg> | \u003Cimg src=\"https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fimages\u002Fleptodactylidae.jpg\">\u003C\u002Fimg> | \u003Cimg src=\"https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fimages\u002Frana.jpg\">\u003C\u002Fimg> | \u003Cimg src=\"https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fimages\u002Feleutherodactylus.jpg\">\u003C\u002Fimg> |\n\n| Comparisons | man -> woman             |  city -> zip | comparative -> superlative |\n| --- | ------------------------|-------------------------|-------------------------|\n| GloVe Geometry | \u003Cimg src=\"https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fimages\u002Fman_woman_small.jpg\">\u003C\u002Fimg>  | \u003Cimg src=\"https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fimages\u002Fcity_zip_small.jpg\">\u003C\u002Fimg> | \u003Cimg src=\"https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002Fimages\u002Fcomparative_superlative_small.jpg\">\u003C\u002Fimg> |\n\nWe provide an implementation of the GloVe model for learning word representations, and describe how to download web-dataset vectors or train your own. See the [project page](https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002F) or the [paper](https:\u002F\u002Fnlp.stanford.edu\u002Fpubs\u002Fglove.pdf) for more information on glove vectors. For documentation and analysis of 2024 vectors, please see the [report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.18103)\n\n## Download pre-trained word vectors \\*\\***NEW 2024 VECTORS**\\*\\*\nThe links below contain word vectors obtained from the respective corpora. If you want word vectors trained on massive web datasets, you need only download one of these text files! Pre-trained word vectors are made available under the \u003Ca href=\"https:\u002F\u002Fopendatacommons.org\u002Flicenses\u002Fpddl\u002F\">Public Domain Dedication and License\u003C\u002Fa>.\n\u003Cdiv class=\"entry\">\n\u003Cul style=\"padding-left:0px; margin-top:0px; margin-bottom:0px\">\n  \u003Cli> **NEW!!** 2024 Dolma (220B tokens, 1.2M vocab, uncased, 300d vectors, 1.6 GB download): \u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.2024.dolma.300d.zip\">glove.2024.dolma.300d.zip\u003C\u002Fa> \u003C\u002Fli>\n  \u003Cli> **NEW!!** 2024 Wikipedia + Gigaword 5 (11.9B tokens, 1.2M vocab, uncased, 300d vectors, 1.6 GB download): \u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.2024.wikigiga.300d.zip\">glove.2024.wikigiga.300d.zip\u003C\u002Fa> \u003C\u002Fli>\n  \u003Cli> **NEW!!** 2024 Wikipedia + Gigaword 5 (11.9B tokens, 1.2M vocab, uncased, 200d vectors, 1.1 GB download): \u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.2024.wikigiga.200d.zip\">glove.2024.wikigiga.200d.zip\u003C\u002Fa> \u003C\u002Fli>\n  \u003Cli> **NEW!!** 2024 Wikipedia + Gigaword 5 (11.9B tokens, 1.2M vocab, uncased, 100d vectors, 560 MB download): \u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.2024.wikigiga.100d.zip\">glove.2024.wikigiga.100d.zip\u003C\u002Fa> \u003C\u002Fli>\n   \u003Cli> **NEW!!** 2024 Wikipedia + Gigaword 5 (11.9B tokens, 1.2M vocab, uncased, 50d vectors, 290 MB download): \u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.2024.wikigiga.50d.zip\">glove.2024.wikigiga.50d.zip\u003C\u002Fa> \u003C\u002Fli>\n  \u003Cli> Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB download): \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fstanfordnlp\u002Fglove\u002Fresolve\u002Fmain\u002Fglove.42B.300d.zip\">glove.42B.300d.zip\u003C\u002Fa> [\u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.42B.300d.zip\">mirror\u003C\u002Fa>] \u003C\u002Fli>\n  \u003Cli> Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fstanfordnlp\u002Fglove\u002Fresolve\u002Fmain\u002Fglove.840B.300d.zip\">glove.840B.300d.zip\u003C\u002Fa> [\u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.840B.300d.zip\">mirror\u003C\u002Fa>] \u003C\u002Fli>\n  \u003Cli> Wikipedia 2014 + Gigaword 5 (6B tokens, 400K vocab, uncased, 300d vectors, 822 MB download): \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fstanfordnlp\u002Fglove\u002Fresolve\u002Fmain\u002Fglove.6B.zip\">glove.6B.zip\u003C\u002Fa> [\u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.6B.zip\">mirror\u003C\u002Fa>] \u003C\u002Fli>\n  \u003Cli> Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 200d vectors, 1.42 GB download): \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fstanfordnlp\u002Fglove\u002Fresolve\u002Fmain\u002Fglove.twitter.27B.zip\">glove.twitter.27B.zip\u003C\u002Fa> [\u003Ca href=\"https:\u002F\u002Fnlp.stanford.edu\u002Fdata\u002Fwordvecs\u002Fglove.twitter.27B.zip\">mirror\u003C\u002Fa>] \u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fdiv>\n\n## Train word vectors on a new corpus\n\n\u003Cimg src=\"https:\u002F\u002Ftravis-ci.org\u002Fstanfordnlp\u002FGloVe.svg?branch=master\">\u003C\u002Fimg>\n\nIf the web datasets above don't match the semantics of your end use case, you can train word vectors on your own corpus.\n\n    $ git clone https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fglove\n    $ cd glove && make\n    $ .\u002Fdemo.sh\n\nMake sure you have the following prerequisites installed when running the steps above:\n\n* GNU Make\n* GCC (Clang pretending to be GCC is fine)\n* Python and NumPy\n\nThe demo.sh script downloads a small corpus, consisting of the first 100M characters of Wikipedia. It collects unigram counts, constructs and shuffles cooccurrence data, and trains a simple version of the GloVe model. It also runs a word analogy evaluation script in python to verify word vector quality. More details about training on your own corpus can be found by reading [demo.sh](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002FGloVe\u002Fblob\u002Fmaster\u002Fdemo.sh) or the [src\u002FREADME.md](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002FGloVe\u002Ftree\u002Fmaster\u002Fsrc)\n\n## 2024 Vector Documentation \nThe training scripts and data preprocessing pipeline used for training the 2024 vectors can be found in the Training_README.md\n\nAnalysis and more documentation for the new vectors can be found in this [report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.18103)\n\n### License\nAll work contained in this package is licensed under the Apache License, Version 2.0. See the include LICENSE file.\n","GloVe 是一个用于生成分布式词表示（即词向量或嵌入）的软件项目。它使用 C 语言编写，通过统计共现矩阵来学习词向量，从而捕捉词语之间的语义关系。GloVe 的核心功能包括提供预训练的词向量和允许用户根据自己的数据集进行训练。该项目支持多种维度的词向量，从 50 维到 300 维不等，并且提供了基于大规模网络数据集和 Wikipedia + Gigaword 5 数据集的新版 2024 词向量。这些词向量在自然语言处理任务中表现出色，如情感分析、文本分类和机器翻译等场景下非常适用。",2,"2026-06-11 03:06:39","top_language"]