[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71076":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":16,"starSnapshotCount":16,"syncStatus":33,"lastSyncTime":34,"discoverSource":35},71076,"mergekit","arcee-ai\u002Fmergekit","arcee-ai","Tools for merging pretrained large language models.","",null,"Python",7130,728,63,240,0,10,24,62,30,39.59,"GNU Lesser General Public License v3.0",false,"main",true,[27,28,29],"llama","llm","model-merging","2026-06-12 02:02:47","# mergekit\n\n[![License: LGPL v3](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-LGPL_v3-blue.svg)](https:\u002F\u002Fwww.gnu.org\u002Flicenses\u002Flgpl-3.0)\n[![GitHub Actions Workflow Status](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Farcee-ai\u002Fmergekit\u002Fpre-commit.yml?label=Tests)](https:\u002F\u002Fgithub.com\u002Farcee-ai\u002Fmergekit\u002Factions\u002Fworkflows\u002Fpre-commit.yml)\n[![Arcee Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArcee%20Discord-Arcee%20Discord?logo=discord&logoColor=white&color=5865F2)](https:\u002F\u002Fdiscord.gg\u002Farceeai)\n\n`mergekit` is a toolkit for merging pre-trained language models. `mergekit` uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.\n\n## Contents\n\n- [Why Merge Models?](#why-merge-models)\n- [Features](#features)\n- [Installation](#installation)\n- [Community & Support](#community--support)\n  - [Contributing](#contributing)\n  - [Community Tools](#community-tools)\n- [Usage](#usage)\n- [Merge Configuration](#merge-configuration)\n  - [Parameter Specification](#parameter-specification)\n  - [Tokenizer Configuration](#tokenizer-configuration)\n  - [Chat Template Configuration](#chat-template-configuration)\n  - [Examples](#examples)\n- [Merge Methods](#merge-methods)\n- [LoRA Extraction](#lora-extraction)\n- [Mixture of Experts Merging](#mixture-of-experts-merging)\n- [Evolutionary Merge Methods](#evolutionary-merge-methods)\n- [Multi-Stage Merging (`mergekit-multi`)](#multi-stage-merging-mergekit-multi)\n- [Raw PyTorch Model Merging (`mergekit-pytorch`)](#raw-pytorch-model-merging-mergekit-pytorch)\n- [Tokenizer Transplantation (`mergekit-tokensurgeon`)](#tokenizer-transplantation-mergekit-tokensurgeon)\n- [Citation](#citation)\n\n## Why Merge Models?\n\nModel merging is a powerful technique that allows combining the strengths of different models without the computational overhead of ensembling or the need for additional training. By operating directly in the weight space of models, merging can:\n\n- Combine multiple specialized models into a single versatile model\n- Transfer capabilities between models without access to training data\n- Find optimal trade-offs between different model behaviors\n- Improve performance while maintaining inference costs\n- Create new capabilities through creative model combinations\n\nUnlike traditional ensembling which requires running multiple models, merged models maintain the same inference cost as a single model while often achieving comparable or superior performance.\n\n## Features\n\nKey features of `mergekit` include:\n\n- Supports Llama, Mistral, GPT-NeoX, StableLM, and more\n- Many [merge methods](#merge-methods)\n- GPU or CPU execution\n- Lazy loading of tensors for low memory use\n- Interpolated gradients for parameter values (inspired by Gryphe's [BlockMerge_Gradient](https:\u002F\u002Fgithub.com\u002FGryphe\u002FBlockMerge_Gradient) script)\n- Piecewise assembly of language models from layers (\"Frankenmerging\")\n- [Mixture of Experts merging](#mixture-of-experts-merging)\n- [LORA extraction](#lora-extraction)\n- [Evolutionary merge methods](#evolutionary-merge-methods)\n- [Multi-stage merging](#multi-stage-merging-mergekit-multi) for complex workflows.\n- [Merging of raw PyTorch models (`mergekit-pytorch`)](#raw-pytorch-model-merging-mergekit-pytorch).\n\n## Installation\n\n```sh\ngit clone https:\u002F\u002Fgithub.com\u002Farcee-ai\u002Fmergekit.git\ncd mergekit\n\npip install -e .  # install the package and make scripts available\n```\n\nIf the above fails with the error of:\n\n```\nERROR: File \"setup.py\" or \"setup.cfg\" not found. Directory cannot be installed in editable mode:\n(A \"pyproject.toml\" file was found, but editable mode currently requires a setuptools-based build.)\n```\n\nYou may need to upgrade pip to > 21.3 with the command `python3 -m pip install --upgrade pip`.\n\n## Community & Support\n\n- **Issues**: [GitHub Issues](https:\u002F\u002Fgithub.com\u002Farcee-ai\u002Fmergekit\u002Fissues)\n- **Discussions**: [Arcee Discord](https:\u002F\u002Fdiscord.gg\u002Farceeai)\n\n### Contributing\n\nWe welcome contributions to `mergekit`! If you have ideas for new merge methods, features, or other improvements, please check out our [contributing guide](CONTRIBUTING.md) for details on how to get started.\n\n### Community Tools\n\n- **[FrankensteinAI](https:\u002F\u002Ffrankenstein-ai.com\u002F)**: For those who prefer a browser-based experience without local setup or hardware wrangling, the team at FrankensteinAI has built a hosted platform powered by `mergekit`. Also features a community gallery and leaderboard for sharing and comparing merged models.\n\n## Usage\n\nThe script `mergekit-yaml` is the main entry point for `mergekit`. It takes a YAML configuration file and an output path, like so:\n\n```sh\nmergekit-yaml path\u002Fto\u002Fyour\u002Fconfig.yml .\u002Foutput-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]\n```\n\nThis will run the merge and write your merged model to `.\u002Foutput-model-directory`.\n\nFor more information on the arguments accepted by `mergekit-yaml` run the command `mergekit-yaml --help`.\n\n### Uploading to Huggingface\n\nWhen you have a merged model you're happy with, you may want to share it on the Hugging Face Hub. `mergekit` generates a `README.md` for your merge with some basic information for a model card. You can edit it to include more details about your merge, like giving it a good name or explaining what it's good at; rewrite it entirely; or use the generated `README.md` as-is. It is also possible to edit your `README.md` online once it has been uploaded to the Hub.\n\nOnce you're happy with your model card and merged model, you can upload it to the Hugging Face Hub using the [huggingface_hub](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Findex) Python library.\n\n```sh\n# log in to huggingface with an access token (must have write permission)\nhuggingface-cli login\n# upload your model\nhuggingface-cli upload your_hf_username\u002Fmy-cool-model .\u002Foutput-model-directory .\n```\n\nThe [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fcli#huggingface-cli-upload) for `huggingface_hub` goes into more detail about other options for uploading.\n\n## Merge Configuration\n\nMerge configurations are YAML documents specifying the operations to perform in order to produce your merged model.\nBelow are the primary elements of a configuration file:\n\n- `merge_method`: Specifies the method to use for merging models. See [Merge Methods](#merge-methods) for a list.\n- `slices`: Defines slices of layers from different models to be used. This field is mutually exclusive with `models`.\n- `models`: Defines entire models to be used for merging. This field is mutually exclusive with `slices`.\n- `base_model`: Specifies the base model used in some merging methods.\n- `parameters`: Holds various parameters such as weights and densities, which can also be specified at different levels of the configuration.\n- `dtype`: Specifies the data type used for the merging operation.\n- `tokenizer` or `tokenizer_source`: Determines how to construct a tokenizer for the merged model.\n- `chat_template`: Specifies a chat template for the merged model.\n\n### Parameter Specification\n\nParameters are flexible and can be set with varying precedence. They can be specified conditionally using tensor name filters, which allows finer control such as differentiating between attention heads and fully connected layers.\n\nParameters can be specified as:\n\n- **Scalars**: Single floating-point values.\n- **Gradients**: List of floating-point values, specifying an interpolated gradient.\n\nThe parameters can be set at different levels, with decreasing precedence as follows:\n\n1. `slices.*.sources.parameters` - applying to a specific input slice\n2. `slices.*.parameters` - applying to a specific output slice\n3. `models.*.parameters` or `input_model_parameters` - applying to any tensors coming from specific input models\n4. `parameters` - catchall\n\n### Tokenizer Configuration\n\nThe tokenizer behavior can be configured in two ways: using the new `tokenizer` field (recommended) or the legacy `tokenizer_source` field (maintained for backward compatibility). These fields are mutually exclusive - you should use one or the other, not both.\n\n#### Modern Configuration (tokenizer)\n\nThe `tokenizer` field provides fine-grained control over vocabulary and embeddings:\n\n```yaml\ntokenizer:\n  source: \"union\"  # or \"base\" or a specific model path\n  tokens:          # Optional: configure specific tokens\n    \u003Ctoken_name>:\n      source: ...  # Specify embedding source\n      force: false # Optional: force this embedding for all models\n  pad_to_multiple_of: null  # Optional: pad vocabulary size\n```\n\n##### Tokenizer Source\n\nThe `source` field determines the vocabulary of the output model:\n\n- `union`: Combine vocabularies from all input models (default)\n- `base`: Use vocabulary from the base model\n- `\"path\u002Fto\u002Fmodel\"`: Use vocabulary from a specific model\n\n##### Token Embedding Handling\n\nWhen a tokenizer is configured, each input model's embedding matrix is adjusted to match the output vocabulary before being passed to the merge method. For tokens a model already has, its own embedding is used. For tokens a model is *missing*, a fallback embedding is assigned using these rules:\n\n- If the base model has the token, use the base model's embedding\n- If only one model has the token, use that model's embedding\n- Otherwise, use an average of all available embeddings\n\nThe merge method then combines these per-model embeddings (original and filled-in) to produce the final output. This means the final embedding for a token present in multiple models is determined by your merge method (SLERP, linear, TIES, etc.), not simply taken from one model.\n\nYou can override these defaults for specific tokens. Any tokens listed here that don't already exist in the output vocabulary will be added automatically, making this useful for introducing new special tokens.\n\n```yaml\ntokenizer:\n  source: union\n  tokens:\n    # Use embedding from a specific model\n    \u003C|im_start|>:\n      source: \"path\u002Fto\u002Fchatml\u002Fmodel\"\n\n    # Force a specific embedding for all models\n    \u003C|special|>:\n      source: \"path\u002Fto\u002Fmodel\"\n      force: true\n\n    # Map a token to another model's token embedding\n    \u003C|renamed_token|>:\n      source:\n        kind: \"model_token\"\n        model: \"path\u002Fto\u002Fmodel\"\n        token: \"\u003C|original_token|>\"  # or use token_id: 1234\n\n    # Use a zero embedding\n    \u003C|unused|>:\n      source:\n        kind: \"zero\"\n```\n\n##### Practical Example\n\nHere's how you might preserve both Llama 3 Instruct and ChatML prompt formats when merging models:\n\n```yaml\ntokenizer:\n  source: union\n  tokens:\n    # ChatML tokens\n    \u003C|im_start|>:\n      source: \"chatml_model\"\n    \u003C|im_end|>:\n      source: \"chatml_model\"\n\n    # Llama 3 tokens - force original embeddings\n    \u003C|start_header_id|>:\n      source: \"llama3_model\"\n      force: true\n    \u003C|end_header_id|>:\n      source: \"llama3_model\"\n      force: true\n    \u003C|eot_id|>:\n      source: \"llama3_model\"\n      force: true\n```\n\n#### Legacy Configuration (tokenizer_source)\n\nFor backward compatibility, the `tokenizer_source` field is still supported:\n\n```yaml\ntokenizer_source: \"union\"  # or \"base\" or a model path\n```\n\nThis provides basic tokenizer selection but lacks the fine-grained control of the modern `tokenizer` field.\n\n### Chat Template Configuration\n\nThe optional `chat_template` field allows overriding the chat template used for the merged model.\n\n```yaml\nchat_template: \"auto\"  # or a template name or Jinja2 template\n```\n\nOptions include:\n\n- `\"auto\"`: Automatically select the most common template among input models\n- Built-in templates: `\"alpaca\"`, `\"chatml\"`, `\"llama3\"`, `\"mistral\"`, `\"exaone\"`\n- A Jinja2 template string for custom formatting\n\n### Examples\n\nSeveral examples of merge configurations are available in [`examples\u002F`](examples\u002F).\n\n## Merge Methods\n\n`mergekit` offers many methods for merging models, each with its own strengths and weaknesses. Choosing the right method depends on your specific goals, the relationship between the models you're merging, and the desired characteristics of the final model.\n\nFor detailed explanations, parameter descriptions, and use cases for each method, please see our [**Merge Method Guide**](docs\u002Fmerge_methods.md).\n\n### Method Overview\n\n| Method (`value`)                                                                                                      | Core Idea                                                            | # Models | Base Model | Key Strengths \u002F Use Cases                                       |\n|:----------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------|:--------:|:----:|:---------------------------------------------------------------|\n| [**Linear** (`linear`)](docs\u002Fmerge_methods.md#linear-linear)                                                          | Simple weighted average of model parameters.                         |    ≥2    |  -   | Averaging similar checkpoints, model soups.                     |\n| [**SLERP** (`slerp`)](docs\u002Fmerge_methods.md#slerp-slerp)                                                              | Spherical linear interpolation between two models.                   |     2    |  ✓   | Smoothly transitioning between two models.                      |\n| [**NuSLERP** (`nuslerp`)](docs\u002Fmerge_methods.md#nuslerp-nuslerp)                                                        | Enhanced SLERP with flexible weighting.                              |     2    |  *   | More intuitive SLERP; task vector SLERP.                        |\n| [**Multi-SLERP** (`multislerp`)](docs\u002Fmerge_methods.md#multi-slerp-multislerp)                                          | Barycentric SLERP for multiple models.                               |    ≥2    |  *   | Spherical interpolation for >2 models.                          |\n| [**Karcher Mean** (`karcher`)](docs\u002Fmerge_methods.md#karcher-mean-karcher)                                              | Riemannian barycenter of model parameters.                           |    ≥2    |  -   | Geometrically sound averaging on manifolds.                     |\n| [**Task Arithmetic** (`task_arithmetic`)](docs\u002Fmerge_methods.md#task-arithmetic-task_arithmetic)                      | Linearly combine \"task vectors\" (differences from a base).           |    ≥2    |  ✓   | Transferring\u002Fcombining fine-tuned skills.                       |\n| [**TIES** (`ties`)](docs\u002Fmerge_methods.md#ties-merging-ties)                                                          | Task arithmetic + sparsification & sign consensus.                   |    ≥2    |  ✓   | Merging many models, reducing interference.                     |\n| [**DARE** (`dare_linear`, `dare_ties`)](docs\u002Fmerge_methods.md#dare-dare_linear-dare_ties)                               | Task arithmetic + random pruning & rescaling.                        |    ≥2    |  ✓   | Robust skill retention, similar to TIES.                        |\n| [**DELLA** (`della`, `della_linear`)](docs\u002Fmerge_methods.md#della-della-della_linear)                                   | Task arithmetic + adaptive magnitude-based pruning.                  |    ≥2    |  ✓   | Prioritizing important changes, reducing interference.          |\n| [**Model Breadcrumbs** (`breadcrumbs`, `breadcrumbs_ties`)](docs\u002Fmerge_methods.md#model-breadcrumbs-breadcrumbs_ties)   | Task arithmetic + outlier removal (small & large diffs).             |    ≥2    |  ✓   | Refining task vectors by removing extreme changes.              |\n| [**SCE** (`sce`)](docs\u002Fmerge_methods.md#sce-sce)                                                                      | Task arithmetic + adaptive matrix-level weighting based on variance. |    ≥2    |  ✓   | Dynamically weighting models based on parameter variance.       |\n| [**Model Stock** (`model_stock`)](docs\u002Fmerge_methods.md#model-stock-model_stock)                                        | Geometric weight calculation for linear interpolation.               |    ≥3    |  ✓   | Finding good linear interpolation weights for many checkpoints. |\n| [**Nearswap** (`nearswap`)](docs\u002Fmerge_methods.md#nearswap-nearswap)                                                    | Interpolate where parameters are similar.                            |     2    |  ✓   | Selective merging based on parameter similarity.                |\n| [**Arcee Fusion** (`arcee_fusion`)](docs\u002Fmerge_methods.md#arcee-fusion-arcee_fusion)                                    | Dynamic thresholding for fusing important changes.                   |     2    |  ✓   | Identifying and merging salient features.                       |\n| [**Passthrough** (`passthrough`)](docs\u002Fmerge_methods.md#passthrough-passthrough)                                        | Directly copies tensors from a single input model.                      |     1    |  -   | Frankenmerging, layer stacking, model surgery.                  |\n\n**Key for `Base Model` Column:**\n\n- ✓: **Required** - One of the input models *must* be designated as the `base_model`.\n- *: **Optional** - One of the input models *can* be designated as the `base_model`.\n- -: **Not Applicable** - `base_model` has no effect on this method.\n\n## LoRA Extraction\n\nMergekit allows extracting PEFT-compatible low-rank approximations of finetuned models.\n\n### Usage\n\n```sh\nmergekit-extract-lora --model finetuned_model_id_or_path --base-model base_model_id_or_path --out-path output_path [--no-lazy-unpickle] [--cuda] [--max-rank=desired_rank] [--sv-epsilon=tol]\n```\n\n## Mixture of Experts Merging\n\nThe `mergekit-moe` script supports merging multiple dense models into a mixture of experts, either for direct use or for further training. For more details see the [`mergekit-moe` documentation](docs\u002Fmoe.md).\n\n## Evolutionary Merge Methods\n\nSee [`docs\u002Fevolve.md`](docs\u002Fevolve.md) for details.\n\n## Multi-Stage Merging (`mergekit-multi`)\n\n`mergekit-multi` enables the execution of complex, multi-stage model merging workflows. You can define multiple merge configurations in a single YAML file, where later merges can use the outputs of earlier ones as inputs. This is useful for building up sophisticated models through a series of targeted merges.\n\nSee the [`mergekit-multi` documentation](docs\u002Fmultimerge.md) for usage details and examples.\n\n## Raw PyTorch Model Merging (`mergekit-pytorch`)\n\nFor merging arbitrary PyTorch models (not necessarily Hugging Face Transformers), `mergekit-pytorch` provides a way to apply mergekit's algorithms directly to `.pt` or `.safetensors` checkpoints. The configuration is similar to the YAML format used in `mergekit-yaml`, but does not support layer slicing or tokenizer configuration.\n\n### Usage\n\n```sh\nmergekit-pytorch path\u002Fto\u002Fyour\u002Fraw_config.yml .\u002Foutput_pytorch_model_directory [options]\n```\n\nUse `mergekit-pytorch --help` for detailed options.\n\n## Tokenizer Transplantation (`mergekit-tokensurgeon`)\n\n`mergekit-tokensurgeon` is a specialized tool for transplanting tokenizers between models, allowing you to align the vocabulary of one model with another. This is particularly useful for cheaply producing draft models for speculative decoding or for cross-tokenizer knowledge distillation. See the [documentation](docs\u002Ftokensurgeon.md) for more details and how to use it.\n\n## Citation\n\nIf you find `mergekit` useful in your research, please consider citing the [paper](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-industry.36\u002F):\n\n```bibtex\n@inproceedings{goddard-etal-2024-arcees,\n    title = \"Arcee{'}s {M}erge{K}it: A Toolkit for Merging Large Language Models\",\n    author = \"Goddard, Charles  and\n      Siriwardhana, Shamane  and\n      Ehghaghi, Malikeh  and\n      Meyers, Luke  and\n      Karpukhin, Vladimir  and\n      Benedict, Brian  and\n      McQuade, Mark  and\n      Solawetz, Jacob\",\n    editor = \"Dernoncourt, Franck  and\n      Preo{\\c{t}}iuc-Pietro, Daniel  and\n      Shimorina, Anastasia\",\n    booktitle = \"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track\",\n    month = nov,\n    year = \"2024\",\n    address = \"Miami, Florida, US\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https:\u002F\u002Faclanthology.org\u002F2024.emnlp-industry.36\",\n    doi = \"10.18653\u002Fv1\u002F2024.emnlp-industry.36\",\n    pages = \"477--485\",\n    abstract = \"The rapid growth of open-source language models provides the opportunity to merge model checkpoints, combining their parameters to improve performance and versatility. Advances in transfer learning have led to numerous task-specific models, which model merging can integrate into powerful multitask models without additional training. MergeKit is an open-source library designed to support this process with an efficient and extensible framework suitable for any hardware. It has facilitated the merging of thousands of models, contributing to some of the world{'}s most powerful open-source model checkpoints. The library is accessible at: https:\u002F\u002Fgithub.com\u002Farcee-ai\u002Fmergekit.\",\n}\n```\n","mergekit 是一个用于合并预训练大型语言模型的工具包。它支持多种模型如Llama、Mistral、GPT-NeoX和StableLM等，并提供多种合并算法，包括基于梯度插值的方法。该工具包采用外存处理技术，在资源受限的情况下也能高效运行，仅需8GB显存即可加速合并过程，同时也完全支持CPU运算。适用于需要整合不同模型优势但又希望避免额外训练成本或计算开销的场景，比如在不增加推理成本的前提下提升模型性能或创造新功能。",2,"2026-06-11 03:35:46","high_star"]