[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72600":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":36,"discoverSource":37},72600,"model2vec","MinishLab\u002Fmodel2vec","MinishLab","Fast State-of-the-Art Static Embeddings","https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fintroduction",null,"Python",2120,121,16,2,0,6,14,63,18,83.56,"MIT License",false,"main",[26,27,28,5,29,30,31,32],"ai","embeddings","machine-learning","nlp","python","sentence-transformers","word-embeddings","2026-06-12 04:01:06","\n\u003Ch2 align=\"center\">\n  \u003Cimg width=\"35%\" alt=\"Model2Vec logo\" src=\"assets\u002Fimages\u002Fmodel2vec_logo.png\">\u003Cbr\u002F>\n  Fast State-of-the-Art Static Embeddings\n\u003C\u002Fh2>\n\n\n\n\u003Cdiv align=\"center\">\n  \u003Ch2>\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fminishlab\">\u003Cstrong>🤗 Models\u003C\u002Fstrong>\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fintroduction\">\u003Cstrong>📖 Docs\u003C\u002Fstrong>\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMinishLab\u002Fmodel2vec\u002Fblob\u002Fmain\u002Fresults\u002FREADME.md\">\u003Cstrong>🏆 Results\u003C\u002Fstrong>\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMinishLab\u002Fmodel2vec\u002Ftree\u002Fmain\u002Ftutorials\">\u003Cstrong>📚 Tutorials\u003C\u002Fstrong>\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Fminish.ai\u002Fblog\">\u003Cstrong>🌐 Blog\u003C\u002Fstrong>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ch2>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fmodel2vec\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmodel2vec?color=%23007ec6&label=pypi%20package\" alt=\"Package version\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fintroduction\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-minish.ai-blue.svg\" alt=\"Docs\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpepy.tech\u002Fproject\u002Fmodel2vec\">\n      \u003Cimg src=\"https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fmodel2vec\" alt=\"Downloads\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fapp.codecov.io\u002Fgh\u002FMinishLab\u002Fmodel2vec\">\n      \u003Cimg src=\"https:\u002F\u002Fcodecov.io\u002Fgh\u002FMinishLab\u002Fmodel2vec\u002Fgraph\u002Fbadge.svg?token=21TWJ6B5ET\" alt=\"Codecov\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002F4BDPR5nmtK\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJoin-Discord-5865F2?logo=discord&logoColor=white\" alt=\"Join Discord\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMinishLab\u002Fmodel2vec\u002Fblob\u002Fmain\u002FLICENSE\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green\" alt=\"License - MIT\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMinishLab\u002Fmodel2vec\u002Fstargazers\">\n      \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fminishlab\u002Fmodel2vec.svg\" alt=Stars\">\n    \u003C\u002Fa>\n  \u003C\u002Fh2>\n\u003C\u002Fdiv>\n\n\n\n\n\nModel2Vec is a technique to turn any sentence transformer into a small, fast static embedding model. Model2Vec reduces model size by a factor up to 50 and makes models up to 500 times faster, with a small drop in performance. Our [best model](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-32M) is the most performant static embedding model in the world. See our [results](results\u002FREADME.md), read our [docs](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fintroduction), or dive in to see how it works.\n\n\u003Cdiv align=\"center\">\n\u003Ch3>\n\n[Quickstart](#quickstart) • [Updates & Announcements](#updates--announcements) • [Main Features](#main-features) • [Model List](#model-list)\n\u003C\u002Fh3>\n\u003C\u002Fdiv>\n\n## Quickstart\n\nInstall the lightweight base package with:\n\n```bash\npip install model2vec\n```\n\nYou can start using Model2Vec by loading one of our [flagship models from the HuggingFace hub](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fminishlab\u002Fpotion-6721e0abd4ea41881417f062). These models are pre-trained and ready to use. The following code snippet shows how to load a model and make embeddings, which you can use for any task, such as  text classification, retrieval, clustering, or building a RAG system:\n```python\nfrom model2vec import StaticModel\n\n# Load a model from the HuggingFace hub (in this case the potion-base-32M model)\nmodel = StaticModel.from_pretrained(\"minishlab\u002Fpotion-base-32M\")\n\n# Make embeddings\nembeddings = model.encode([\"It's dangerous to go alone!\", \"It's a secret to everybody.\"])\n\n# Make sequences of token embeddings\ntoken_embeddings = model.encode_as_sequence([\"It's dangerous to go alone!\", \"It's a secret to everybody.\"])\n```\nFor advanced usage, see our [inference docs](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Finference). Instead of using one of our models, you can also distill your own Model2Vec model from a Sentence Transformer model. First, install the `distillation` extras with:\n\n```bash\npip install model2vec[distill]\n```\n\n\nThen, you can distill a model in ~30 seconds on a CPU with the following code snippet:\n\n```python\nfrom model2vec.distill import distill\n\n# Distill a Sentence Transformer model, in this case the BAAI\u002Fbge-base-en-v1.5 model\nm2v_model = distill(model_name=\"BAAI\u002Fbge-base-en-v1.5\")\n\n# Save the model\nm2v_model.save_pretrained(\"m2v_model\")\n```\n\nFor advanced usage, see our [distillation docs](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fdistillation), which includes some [distillation best practices](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fdistillation#distillation-best-practices). After distillation, you can also fine-tune your own classification models on top of the distilled model, or on a pre-trained model. First, make sure you install the `training` extras with:\n\n```bash\npip install model2vec[train]\n```\n\nThen, you can fine-tune a model as follows:\n\n```python\nimport numpy as np\nfrom datasets import load_dataset\nfrom model2vec.train import StaticModelForClassification\n\n# Initialize a classifier from a pre-trained model\nclassifier = StaticModelForClassification.from_pretrained(model_name=\"minishlab\u002Fpotion-base-32M\")\n\n# Load a dataset. Note: both single and multi-label classification datasets are supported\nds = load_dataset(\"setfit\u002Fsubj\")\n\n# Train the classifier on text (X) and labels (y)\nclassifier.fit(ds[\"train\"][\"text\"], ds[\"train\"][\"label\"])\n\n# Evaluate the classifier\nclassification_report = classifier.evaluate(ds[\"test\"][\"text\"], ds[\"test\"][\"label\"])\n```\n\nFor advanced usage, see our [training docs](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Ftraining).\n\n## Updates & Announcements\n\n- **23\u002F05\u002F2025**: We released [potion-multilingual-128M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-multilingual-128M), a multilingual model trained on 101 languages. It is the best performing static embedding model for multilingual tasks, and is capable of generating embeddings for any text in any language. The results can be found in our [results](results\u002FREADME.md#mmteb-results-multilingual) section.\n\n- **01\u002F05\u002F2025**: We released backend support for `BPE` and `Unigram` tokenizers, along with quantization and dimensionality reduction. New Model2Vec models are now 50% of the original models size, and can be quantized to int8 to be 25% of the size, without loss of performance.\n\n- **12\u002F02\u002F2025**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https:\u002F\u002Fgithub.com\u002FMinishLab\u002Fmodel2vec\u002Fblob\u002Fmain\u002Fmodel2vec\u002Ftrain\u002FREADME.md) and [results](results\u002FREADME.md#training-results).\n\n- **30\u002F01\u002F2025**: We released two new models: [potion-base-32M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-32M) and [potion-retrieval-32M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-retrieval-32M). [potion-base-32M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-retrieval-32M) is a finetune of [potion-base-32M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available.\n\n- **30\u002F10\u002F2024**: We released three new models: [potion-base-8M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-8M), [potion-base-4M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-4M), and [potion-base-2M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-2M). These models are trained using [Tokenlearn](https:\u002F\u002Fgithub.com\u002FMinishLab\u002Ftokenlearn). Find out more in our [blog post](https:\u002F\u002Fminishlab.github.io\u002Ftokenlearn_blogpost\u002F). NOTE: for users of any of our old English M2V models, we recommend switching to these new models as they [perform better on all tasks](https:\u002F\u002Fgithub.com\u002FMinishLab\u002Fmodel2vec\u002Ftree\u002Fmain\u002Fresults).\n\n## Main Features\n\n- **State-of-the-Art Performance**: Model2Vec models outperform any other static embeddings (such as GLoVe and BPEmb) by a large margin, as can be seen in our [results](results\u002FREADME.md).\n- **Small**: Model2Vec reduces the size of a Sentence Transformer model by a factor of up to 50. Our [best model](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-8M) is just ~30 MB on disk, and our smallest model just ~8 MB (making it the smallest model on [MTEB](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmteb\u002Fleaderboard)!).\n- **Lightweight Dependencies**: the base package's only major dependency is `numpy`.\n- **Lightning-fast Inference**: up to 500 times faster on CPU than the original model.\n- **Fast, Dataset-free Distillation**: distill your own model in 30 seconds on a CPU, without a dataset.\n- **Fine-tuning**: fine-tune your own classification models on top of Model2Vec models.\n- **Integrated in many popular libraries**: Model2Vec is integrated direclty into popular libraries such as [Sentence Transformers](https:\u002F\u002Fgithub.com\u002FUKPLab\u002Fsentence-transformers) and [LangChain](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flangchain). For more information, see our [integrations documentation](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fintegrations).\n- **Tightly integrated with HuggingFace hub**: easily share and load models from the HuggingFace hub, using the familiar `from_pretrained` and `push_to_hub`. Our own models can be found [here](https:\u002F\u002Fhuggingface.co\u002Fminishlab).\n\n## What is Model2Vec?\n\nModel2vec creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on all tasks we could find, while being much faster to create than traditional static embedding models such as GloVe. Like BPEmb, it can create subword embeddings, but with much better performance. Distillation doesn't need _any_ data, just a vocabulary and a model.\n\nThe core idea is to forward pass a vocabulary through a sentence transformer model, creating static embeddings for the indiviudal tokens. After this, there are a number of post-processing steps we do that results in our best models, as well as an optional pre-training step to further boost performance. For a more extensive deepdive, please refer to our [official documentation on how Model2Vec works](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fintroduction#how-mode2vec-works).\n\n## Documentation\n\nOur official documentation can be found [here](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fintroduction). This includes in-depth documentation on [inference](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Finference), [distillation](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fdistillation), [training](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Ftraining), and [integrations](https:\u002F\u002Fminish.ai\u002Fpackages\u002Fmodel2vec\u002Fintegrations).\n\n\n## Model List\n\nWe provide a number of models that can be used out of the box. These models are available on the [HuggingFace hub](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fminishlab\u002Fmodel2vec-base-models-66fd9dd9b7c3b3c0f25ca90e) and can be loaded using the `from_pretrained` method. The models are listed below.\n\n\n\n| Model                                                                 | Language    | Sentence Transformer                                            | Params  | Task      |\n|-----------------------------------------------------------------------|------------|-----------------------------------------------------------------|---------|-----------|\n| [potion-base-32M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-32M)   | English    | [bge-base-en-v1.5](https:\u002F\u002Fhuggingface.co\u002FBAAI\u002Fbge-base-en-v1.5) | 32.3M   | General   |\n| [potion-multilingual-128M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-multilingual-128M) | Multilingual | [bge-m3](https:\u002F\u002Fhuggingface.co\u002FBAAI\u002Fbge-m3)      | 128M    | General   |\n| [potion-retrieval-32M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-retrieval-32M) | English    | [bge-base-en-v1.5](https:\u002F\u002Fhuggingface.co\u002FBAAI\u002Fbge-base-en-v1.5) | 32.3M   | Retrieval |\n| [potion-base-8M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-8M)     | English    | [bge-base-en-v1.5](https:\u002F\u002Fhuggingface.co\u002FBAAI\u002Fbge-base-en-v1.5) | 7.5M    | General   |\n| [potion-base-4M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-4M)     | English    | [bge-base-en-v1.5](https:\u002F\u002Fhuggingface.co\u002FBAAI\u002Fbge-base-en-v1.5) | 3.7M    | General   |\n| [potion-base-2M](https:\u002F\u002Fhuggingface.co\u002Fminishlab\u002Fpotion-base-2M)     | English    | [bge-base-en-v1.5](https:\u002F\u002Fhuggingface.co\u002FBAAI\u002Fbge-base-en-v1.5) | 1.8M    | General   |\n\n\n\n\n## Results\n\nWe have performed extensive experiments to evaluate the performance of Model2Vec models. The results are documented in the [results](results\u002FREADME.md) folder. The results are presented in the following sections:\n- [MTEB Results](results\u002FREADME.md#mteb-results)\n- [MMTEB Results](results\u002FREADME.md#mmteb-results)\n- [Retrieval Results](results\u002FREADME.md#retrieval-results)\n- [Training Results](results\u002FREADME.md#training-results)\n- [Ablations](results\u002FREADME.md#ablations)\n\n## License\n\nMIT\n\n## Citing\n\nIf you use Model2Vec in your research, please cite the following:\n\n```bibtex\n@software{minishlab2024model2vec,\n  author       = {Stephan Tulkens and {van Dongen}, Thomas},\n  title        = {Model2Vec: Fast State-of-the-Art Static Embeddings},\n  year         = {2024},\n  publisher    = {Zenodo},\n  doi          = {10.5281\u002Fzenodo.17270888},\n  url          = {https:\u002F\u002Fgithub.com\u002FMinishLab\u002Fmodel2vec},\n  license      = {MIT}\n}\n```\n","Model2Vec 是一种将任何句子转换器转化为小型、快速静态嵌入模型的技术。其核心功能在于能够将模型大小减少至原来的1\u002F50，并使模型运行速度提升高达500倍，虽然性能略有下降但仍然保持了高水平。该项目采用Python语言开发，支持多种NLP任务，如文本分类、检索、聚类等。特别适用于需要高效处理大量文本数据且对实时性要求较高的场景中使用。此外，项目提供了详细的文档和教程，易于上手。","2026-06-11 03:42:44","high_star"]