[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9713":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":47,"readmeContent":48,"aiSummary":49,"trendingCount":16,"starSnapshotCount":16,"syncStatus":50,"lastSyncTime":51,"discoverSource":52},9713,"petals","bigscience-workshop\u002Fpetals","bigscience-workshop","🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading","https:\u002F\u002Fpetals.dev",null,"Python",10187,612,105,92,0,3,16,63,12,88.66,"MIT License",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46],"bloom","chatbot","deep-learning","distributed-systems","falcon","gpt","guanaco","language-models","large-language-models","llama","machine-learning","mixtral","neural-networks","nlp","pipeline-parallelism","pretrained-models","pytorch","tensor-parallelism","transformer","volunteer-computing","2026-06-12 04:00:46","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002F7eR7Pan.png\" width=\"400\">\u003Cbr>\n    Run large language models at home, BitTorrent-style.\u003Cbr>\n    Fine-tuning and inference \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals#benchmarks\">up to 10x faster\u003C\u002Fa> than offloading\n    \u003Cbr>\u003Cbr>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fpetals\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fpetals.svg?color=green\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FtfHfe8B34k\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F865254854262652969?label=discord&logo=discord&logoColor=white\">\u003C\u002Fa>\n    \u003Cbr>\n\u003C\u002Fp>\n\nGenerate text with distributed **Llama 3.1** (up to 405B), **Mixtral** (8x22B), **Falcon** (40B+) or **BLOOM** (176B) and fine‑tune them for your own tasks &mdash; right from your desktop computer or Google Colab:\n\n```python\nfrom transformers import AutoTokenizer\nfrom petals import AutoDistributedModelForCausalLM\n\n# Choose any model available at https:\u002F\u002Fhealth.petals.dev\nmodel_name = \"meta-llama\u002FMeta-Llama-3.1-405B-Instruct\"\n\n# Connect to a distributed network hosting model layers\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoDistributedModelForCausalLM.from_pretrained(model_name)\n\n# Run the model as if it were on your computer\ninputs = tokenizer(\"A cat sat\", return_tensors=\"pt\")[\"input_ids\"]\noutputs = model.generate(inputs, max_new_tokens=5)\nprint(tokenizer.decode(outputs[0]))  # A cat sat on a mat...\n```\n\n\u003Cp align=\"center\">\n    🚀 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing\">Try now in Colab\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n🦙 **Want to run Llama?** [Request access](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct) to its weights, then run `huggingface-cli login` in the terminal before loading the model. Or just try it in our [chatbot app](https:\u002F\u002Fchat.petals.dev).\n\n🔏 **Privacy.** Your data will be processed with the help of other people in the public swarm. Learn more about privacy [here](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FSecurity,-privacy,-and-AI-safety). For sensitive data, you can set up a [private swarm](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FLaunch-your-own-swarm) among people you trust.\n\n💬 **Any questions?** Ping us in [our Discord](https:\u002F\u002Fdiscord.gg\u002FKdThf2bWVU)!\n\n## Connect your GPU and increase Petals capacity\n\nPetals is a community-run system &mdash; we rely on people sharing their GPUs. You can help serving one of the [available models](https:\u002F\u002Fhealth.petals.dev) or host a new model from 🤗 [Model Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels)!\n\nAs an example, here is how to host a part of [Llama 3.1 (405B) Instruct](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct) on your GPU:\n\n🦙 **Want to host Llama?** [Request access](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FMeta-Llama-3.1-405B-Instruct) to its weights, then run `huggingface-cli login` in the terminal before loading the model.\n\n🐧 **Linux + Anaconda.** Run these commands for NVIDIA GPUs (or follow [this](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRunning-on-AMD-GPU) for AMD):\n\n```bash\nconda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia\npip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\npython -m petals.cli.run_server meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n🪟 **Windows + WSL.** Follow [this guide](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRun-Petals-server-on-Windows) on our Wiki.\n\n🐋 **Docker.** Run our [Docker](https:\u002F\u002Fwww.docker.com) image for NVIDIA GPUs (or follow [this](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRunning-on-AMD-GPU) for AMD):\n\n```bash\nsudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:\u002Fcache --rm \\\n    learningathome\u002Fpetals:main \\\n    python -m petals.cli.run_server --port 31330 meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n🍏 **macOS + Apple M1\u002FM2 GPU.** Install [Homebrew](https:\u002F\u002Fbrew.sh\u002F), then run these commands:\n\n```bash\nbrew install python\npython3 -m pip install git+https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\npython3 -m petals.cli.run_server meta-llama\u002FMeta-Llama-3.1-405B-Instruct\n```\n\n\u003Cp align=\"center\">\n    📚 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions#running-a-server\">Learn more\u003C\u002Fa>\u003C\u002Fb> (how to use multiple GPUs, start the server on boot, etc.)\n\u003C\u002Fp>\n\n🔒 **Security.** Hosting a server does not allow others to run custom code on your computer. Learn more [here](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FSecurity,-privacy,-and-AI-safety).\n\n💬 **Any questions?** Ping us in [our Discord](https:\u002F\u002Fdiscord.gg\u002FX7DgtxgMhc)!\n\n🏆 **Thank you!** Once you load and host 10+ blocks, we can show your name or link on the [swarm monitor](https:\u002F\u002Fhealth.petals.dev) as a way to say thanks. You can specify them with `--public_name YOUR_NAME`.\n\n## How does it work?\n\n- You load a small part of the model, then join a [network](https:\u002F\u002Fhealth.petals.dev) of people serving the other parts. Single‑batch inference runs at up to **6 tokens\u002Fsec** for **Llama 2** (70B) and up to **4 tokens\u002Fsec** for **Falcon** (180B) — enough for [chatbots](https:\u002F\u002Fchat.petals.dev) and interactive apps.\n- You can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of **PyTorch** and **🤗 Transformers**.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FRTYF3yW.png\" width=\"800\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    📜 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01188.pdf\">Read paper\u003C\u002Fa>\u003C\u002Fb>\n    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n    📚 &nbsp;\u003Cb>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions\">See FAQ\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n## 📚 Tutorials, examples, and more\n\nBasic tutorials:\n\n- Getting started: [tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing)\n- Prompt-tune Llama-65B for text semantic classification: [tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fbigscience-workshop\u002Fpetals\u002Fblob\u002Fmain\u002Fexamples\u002Fprompt-tuning-sst2.ipynb)\n- Prompt-tune BLOOM to create a personified chatbot: [tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fbigscience-workshop\u002Fpetals\u002Fblob\u002Fmain\u002Fexamples\u002Fprompt-tuning-personachat.ipynb)\n\nUseful tools:\n\n- [Chatbot web app](https:\u002F\u002Fchat.petals.dev) (connects to Petals via an HTTP\u002FWebSocket endpoint): [source code](https:\u002F\u002Fgithub.com\u002Fpetals-infra\u002Fchat.petals.dev)\n- [Monitor](https:\u002F\u002Fhealth.petals.dev) for the public swarm: [source code](https:\u002F\u002Fgithub.com\u002Fpetals-infra\u002Fhealth.petals.dev)\n\nAdvanced guides:\n\n- Launch a private swarm: [guide](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FLaunch-your-own-swarm)\n- Run a custom model: [guide](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FRun-a-custom-model-with-Petals)\n\n### Benchmarks\n\nPlease see **Section 3.3** of our [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01188.pdf).\n\n### 🛠️ Contributing\n\nPlease see our [FAQ](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Fpetals\u002Fwiki\u002FFAQ:-Frequently-asked-questions#contributing) on contributing.\n\n### 📜 Citations\n\nAlexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel.\n[Petals: Collaborative Inference and Fine-tuning of Large Models.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.01188)\n_Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)._ 2023.\n\n```bibtex\n@inproceedings{borzunov2023petals,\n  title = {Petals: Collaborative Inference and Fine-tuning of Large Models},\n  author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin},\n  booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},\n  pages = {558--568},\n  year = {2023},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.01188}\n}\n```\n\nAlexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin Raffel.\n[Distributed inference and fine-tuning of large language models over the Internet.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08361)\n_Advances in Neural Information Processing Systems_ 36 (2023).\n\n```bibtex\n@inproceedings{borzunov2023distributed,\n  title = {Distributed inference and fine-tuning of large language models over the {I}nternet},\n  author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin},\n  booktitle = {Advances in Neural Information Processing Systems},\n  volume = {36},\n  pages = {12312--12331},\n  year = {2023},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.08361}\n}\n```\n\n--------------------------------------------------------------------------------\n\n\u003Cp align=\"center\">\n    This project is a part of the \u003Ca href=\"https:\u002F\u002Fbigscience.huggingface.co\u002F\">BigScience\u003C\u002Fa> research workshop.\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fpetals.dev\u002Fbigscience.png\" width=\"150\">\n\u003C\u002Fp>\n","Petals 是一个用于在个人电脑上以BitTorrent风格运行大型语言模型的项目，支持模型包括Llama 3.1、Mixtral、Falcon和BLOOM等。它通过分布式计算的方式，使得模型微调和推理速度比传统方法快至10倍。基于Python开发，并利用了PyTorch框架与transformer架构，Petals实现了高效的管道并行化和张量并行化技术来加速处理过程。该项目非常适合那些希望在本地机器（如桌面计算机或Google Colab）上低成本地使用强大语言模型的研究人员、开发者以及爱好者们。同时，用户还可以参与到贡献GPU资源给公共网络中去，共同提高整个系统的性能。",2,"2026-06-11 03:24:21","top_topic"]