[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-78603":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":8,"rankLanguage":8,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":8,"pushedAt":8,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":14,"starSnapshotCount":14,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},78603,"esm","Biohub\u002Fesm","Biohub",null,"Jupyter Notebook",2724,336,39,70,0,33,109,383,99,106.58,"Other",false,"main",true,[],"2026-06-12 04:01:23","# ESM\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"_assets\u002Flogo.png\" width=\"50\"\u002F>\n\n[ESM3](https:\u002F\u002Fwww.science.org\u002Fdoi\u002F10.1126\u002Fscience.ads0018) &sdot; [ESM C](https:\u002F\u002Fwww.evolutionaryscale.ai\u002Fblog\u002Fesm-cambrian) &sdot;\n[Slack](https:\u002F\u002Fbit.ly\u002Fesm-slack) &sdot; [Tutorials](https:\u002F\u002Fgithub.com\u002Fevolutionaryscale\u002Fesm\u002Ftree\u002Fmain\u002Fcookbook\u002Ftutorials) \u003Cbr>\n\u003C\u002Fdiv>\n\n\n- [Installation ](#installation-)\n- [Available Models](#available-models-)\n- [ESM 3](#esm-3-)\n  - [Quickstart for ESM3 Open](#esm3-quickstart-)\n  - [ESM3 98B via Forge API](#esm3-forge)\n  - [ESM3 Example Usage](#esm3-example-usage)\n- [ESM C](#esm-c-)\n  - [Quickstart for ESM C Open Models](#esm-c-open-)\n  - [ESM C 6B via Forge API](#esm-c-forge-)\n  - [ESM C via SageMaker for Commercial Use  ](#esm-c-sagemaker-)\n  - [ESM C Example Usage](#esm-c-example-)\n- [Responsible Development](#responsible-development-)\n- [Licenses](#licenses-)\n- [Citations  ](#citations-)\n\n\nThis repository contains flagship protein models for EvolutionaryScale, as well as access to the API. [ESM3](https:\u002F\u002Fwww.evolutionaryscale.ai\u002Fpapers\u002Fesm3-simulating-500-million-years-of-evolution-with-a-language-model) is our flagship multimodal protein generative model, and can be used for generation and prediction tasks. [ESM C](https:\u002F\u002Fwww.evolutionaryscale.ai\u002Fblog\u002Fesm-cambrian) is our best protein representation learning model, and can be used to embed protein sequences.\n\n## Installation \u003Ca name=\"installation\">\u003C\u002Fa>\n\nTo get started with ESM, install the python library using pip:\n\n```bash\npip install esm\n```\n\n## Available Models \u003Ca name=\"available-models\">\u003C\u002Fa>\n\n### ESM 3 Family\n\n| Model | Model Size | Release Date | Note |\n|-------|------------|--------------|------|\n| **Flagship Models** | | | Most users will be interested in using one of these models. |\n| esm3-large-2024-03 | 98B | 2024-03 | |\n| esm3-medium-2024-08 | 7B | 2024-08 | |\n| esm3-small-2024-08 | 1.4B | 2024-08 | |\n| **Published Models** | | | These models were used to generate all of the results in the ESM3 paper and are provided to facilitate reproducibility. |\n| esm3-large-2024-03 | 98B | 2024-03 | |\n| esm3-medium-2024-03 | 7B | 2024-03 | |\n| esm3-small-2024-03 | 1.4B | 2024-03 | |\n| **Experimental Models** | | | These models are provided for early use by researchers and are still under development. |\n| esm3-medium-multimer-2024-09 | 7B | 2024-09 | |\n\n### ESM C Models\n\n| Model | Model Size | Number of Layers | Release Date |\n|-------|------------|------------------|--------------|\n| esmc-6b-2024-12 | 6B | 80 | 2024-12 |\n| esmc-600m-2024-12 | 600M | 36 | 2024-12 |\n| esmc-300m-2024-12 | 300M | 30 | 2024-12 |\n\n## ESM 3  \u003Ca name=\"esm3\">\u003C\u002Fa>\n\n[ESM3](https:\u002F\u002Fwww.evolutionaryscale.ai\u002Fpapers\u002Fesm3-simulating-500-million-years-of-evolution-with-a-language-model) is a frontier generative model for biology, able to jointly reason across three fundamental biological properties of proteins: sequence, structure, and function. These three data modalities are represented as tracks of discrete tokens at the input and output of ESM3. You can present the model with a combination of partial inputs across the tracks, and ESM3 will provide output predictions for all the tracks.\n\nESM3 is a _generative_ masked language model. You can prompt it with partial sequence, structure, and function keywords, and iteratively sample masked positions until all positions are unmasked. This iterative sampling is what the `.generate()` function does.\n\n\u003C!--![ESM3 Diagram](_assets\u002Fesm3_diagram.png)-->\n\u003Cimg src=\"_assets\u002Fesm3_diagram.png\" alt=\"ESM3 Diagram\" width=\"400\" \u002F>\n\nThe ESM3 architecture is highly scalable due to its transformer backbone and all-to-all reasoning over discrete token sequences. At its largest scale, ESM3 was trained with 1.07e24 FLOPs on 2.78 billion proteins and 771 billion unique tokens, and has 98 billion parameters.\nLearn more by reading the [blog post](https:\u002F\u002Fwww.evolutionaryscale.ai\u002Fblog\u002Fesm3-release) and [the paper (Hayes et al., 2024)](https:\u002F\u002Fwww.science.org\u002Fdoi\u002F10.1126\u002Fscience.ads0018).\n\nESM3-open, with 1.4B parameters, is the smallest and fastest model in the family.\n\n### Quickstart for ESM3-open \u003Ca name=\"esm3-quickstart\">\u003C\u002Fa>\n\nThe weights are stored on HuggingFace Hub under [HuggingFace\u002FEvolutionaryScale\u002Fesm3](https:\u002F\u002Fhuggingface.co\u002FEvolutionaryScale\u002Fesm3).\n\n```py\nfrom huggingface_hub import login\nfrom esm.models.esm3 import ESM3\nfrom esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig\n\n# Will instruct you how to get an API key from huggingface hub, make one with \"Read\" permission.\nlogin()\n\n# This will download the model weights and instantiate the model on your machine.\nmodel: ESM3InferenceClient = ESM3.from_pretrained(\"esm3-open\").to(\"cuda\") # or \"cpu\"\n\n# Generate a completion for a partial Carbonic Anhydrase (2vvb)\nprompt = \"___________________________________________________DQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPP___________________________________________________________\"\nprotein = ESMProtein(sequence=prompt)\n# Generate the sequence, then the structure. This will iteratively unmask the sequence track.\nprotein = model.generate(protein, GenerationConfig(track=\"sequence\", num_steps=8, temperature=0.7))\n# We can show the predicted structure for the generated sequence.\nprotein = model.generate(protein, GenerationConfig(track=\"structure\", num_steps=8))\nprotein.to_pdb(\".\u002Fgeneration.pdb\")\n# Then we can do a round trip design by inverse folding the sequence and recomputing the structure\nprotein.sequence = None\nprotein = model.generate(protein, GenerationConfig(track=\"sequence\", num_steps=8))\nprotein.coordinates = None\nprotein = model.generate(protein, GenerationConfig(track=\"structure\", num_steps=8))\nprotein.to_pdb(\".\u002Fround_tripped.pdb\")\n```\n\nCongratulations, you just generated your first proteins with ESM3!\n\n### EvolutionaryScale Forge: Access to larger ESM3 models\n \u003Ca name=\"esm3-forge\">\u003C\u002Fa>\n\nYou can access all scales of ESM3 models [EvolutionaryScale Forge](https:\u002F\u002Fforge.evolutionaryscale.ai).\n\nWe encourage users to interact with the Forge API through the python `esm` library instead of the command line.\nThe python interface enables you to interactively load proteins, build prompts, and inspect generated proteins\nwith the `ESMProtein` and config classes used to interact with the local model.\n\nIn any example script you can replace a local `ESM3` model with a Forge API client:\n\n```py\n# Instead of loading the model locally on your machine:\nmodel: ESM3InferenceClient = ESM3.from_pretrained(\"esm3_sm_open_v1\").to(\"cuda\") # or \"cpu\"\n# just replace the line with this:\nmodel: ESM3InferenceClient = esm.sdk.client(\"esm3-medium-2024-08\", token=\"\u003Cyour forge token>\")\n# and now you're interfacing with the model running on our remote servers.\n...\n```\n\nand the exact same code will work.\nThis enables a seamless transition from smaller and faster models, to our largest and most capable protein language models for protein design work.\n\n### Async Forge Client\nThe Forge client supports asynchronous API calls for improved performance when making multiple requests. Async methods follow the same naming convention as their synchronous counterparts, with `async_` prepended to the method name. For example:\n\n```py\nmodel = esm.sdk.client(\"esm3-medium-2024-08\", token=\"\u003Cyour forge token>\")\n\nprotein = await model.async_generate(protein, GenerationConfig(track=\"sequence\"))\n```\n\n### ESM3 Example Usage\n \u003Ca name=\"esm3-example\">\u003C\u002Fa>\n\n[Generating a novel GFP with chain of thought generation using ESM3](.\u002Fcookbook\u002Ftutorials\u002F3_gfp_design.ipynb) [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fevolutionaryscale\u002Fesm\u002Fblob\u002Fmain\u002Fcookbook\u002Ftutorials\u002F3_gfp_design.ipynb)\n\n[Advanced prompting with ESM3 input tracks](.\u002Fcookbook\u002Ftutorials\u002F4_forge_generate.ipynb) [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fevolutionaryscale\u002Fesm\u002Fblob\u002Fmain\u002Fcookbook\u002Ftutorials\u002F4_forge_generate.ipynb)\n\n\n\n## ESM C \u003Ca name=\"esm-c\">\u003C\u002Fa>\n[ESM Cambrian](https:\u002F\u002Fwww.evolutionaryscale.ai\u002Fblog\u002Fesm-cambrian) is a parallel model family to our flagship ESM3 generative models. While ESM3 focuses on controllable generation of proteins, ESM C focuses on creating representations of the underlying biology of proteins.\n\nESM C is designed as a drop-in replacement for ESM2 and comes with major performance benefits. The 300M parameter ESM C delivers similar performance to ESM2 650M with dramatically reduced memory requirements and faster inference. The 600M parameter ESM C rivals the 3B parameter ESM2 and approaches the capabilities of the 15B model, delivering frontier performance with far greater efficiency. The 6B parameter ESM C outperforms the best ESM2 models by a wide margin.\n\nESM C can be run locally, via [the Forge API](https:\u002F\u002Fforge.evolutionaryscale.ai\u002F) or through [AWS SageMaker](https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fseller-profile?id=seller-iw2nbscescndm).\n\n### Quickstart for ESM C Open Models\u003Ca name=\"esm-c-open\">\u003C\u002Fa>\nWhen running the code below, a pytorch model will be instantiated locally on your machine, with the weights downloaded from the [HuggingFace hub](https:\u002F\u002Fhuggingface.co\u002FEvolutionaryScale).\n```py\nfrom esm.models.esmc import ESMC\nfrom esm.sdk.api import ESMProtein, LogitsConfig\n\nprotein = ESMProtein(sequence=\"AAAAA\")\nclient = ESMC.from_pretrained(\"esmc_300m\").to(\"cuda\") # or \"cpu\"\nprotein_tensor = client.encode(protein)\nlogits_output = client.logits(\n   protein_tensor, LogitsConfig(sequence=True, return_embeddings=True)\n)\nprint(logits_output.logits, logits_output.embeddings)\n```\n\nTo use Flash Attention with the open weights:\n\nSimply install flash-attn package, which will enable Flash Attention automatically:\n```\npip install flash-attn --no-build-isolation\n```\n\nYou can also disable flash-attn by passing ``use_flash_attn=False`` to utils like ``ESMC_300M_202412``.\n\n### ESM C 6B via Forge API \u003Ca name=\"esm-c-forge\">\u003C\u002Fa>\n\nApply for access and copy the API token from the console by first visiting [Forge](https:\u002F\u002Fforge.evolutionaryscale.ai).\n\nWith the code below, a local python client talks to the model inference server hosted by EvolutionaryScale.\n\n```py\nfrom esm.sdk.forge import ESM3ForgeInferenceClient\nfrom esm.sdk.api import ESMProtein, LogitsConfig\n\n# Apply for forge access and get an access token\nforge_client = ESM3ForgeInferenceClient(model=\"esmc-6b-2024-12\", url=\"https:\u002F\u002Fforge.evolutionaryscale.ai\", token=\"\u003Cyour forge token>\")\nprotein_tensor = forge_client.encode(protein)\nlogits_output = forge_client.logits(\n   protein_tensor, LogitsConfig(sequence=True, return_embeddings=True)\n)\nprint(logits_output.logits, logits_output.embeddings)\n```\n\nRemember to replace `\u003Cyour forge token>` with your actual Forge access token.\n\n### Forge Batch Executor\n\nFor jobs that require processing multiple inputs, the Forge Batch Executor provides a streamlined and way to execute them concurrently and efficiently while respecting rate limits and adapting to request latency.\n\n```py\nfrom esm.sdk.forge import ESM3ForgeInferenceClient\nfrom esm.sdk.api import ESMProtein, LogitsConfig\nfrom esm.sdk import batch_executor\n\ndef embed_sequence(client: ESM3ForgeInferenceClient, sequence: str) -> LogitsOutput:\n    protein = ESMProtein(sequence=sequence)\n    protein_tensor = client.encode(protein)\n    if isinstance(protein_tensor, ESMProteinError):\n        raise protein_tensor\n    output = client.logits(protein_tensor, LogitsConfig(sequence=True, return_embeddings=True))\n    return output\n\nsequences = [\"A\", \"AA\", \"AAA\"]\nclient =  ESM3ForgeInferenceClient(model=\"esmc-6b-2024-12\", url=\"https:\u002F\u002Fforge.evolutionaryscale.ai\", token=\"\u003Cyour forge token>\")\n\n# Usage Example:\n# To execute a batch job, wrap your function inside the batch executor context manager.\n# Syntax:\n# with batch_executor() as executor:\n#     outputs = executor.execute_batch(user_func=\u003Cyour_function>, **kwargs)\n\nwith batch_executor() as executor:\n    outputs = executor.execute_batch(user_func=embed_sequence, client=client, sequence=sequences)\n```\n\n### ESM C via SageMaker for Commercial Use  \u003Ca name=\"esm-c-sagemaker\">\u003C\u002Fa>\n\nESM C models are also available on Amazon SageMaker under the [Cambrian Inference Clickthrough License Agreement](https:\u002F\u002Fwww.evolutionaryscale.ai\u002Fpolicies\u002Fcambrian-inference-clickthrough-license-agreement).\nUnder this license agreement, models are available for broad use for commercial entities.\n\nYou will need an admin AWS access to an AWS account to follow these instructions. To deploy, first we need to deploy the AWS package:\n\n1. Find the ESM C model version you want to subscribe to. All of our offerings are visible [here](https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fseller-profile?id=seller-iw2nbscescndm).\n2. Click the name of the model version you are interested in, review pricing information and the end user license agreement (EULA), then click \"Continue to Subscribe\".\n3. Once you have subscribed, you should be able to see our model under your [marketplace subscriptions](https:\u002F\u002Fus-east-1.console.aws.amazon.com\u002Fmarketplace\u002Fhome#\u002Fsubscriptions).\n4. Click the product name and then from the \"Actions\" dropdown select \"Configure\".\n5. You will next see the \"Configure and Launch\" UI. There are multiple deployment paths - we recommend using \"AWS CloudFormation\".\n6. The default value for \"Service Access\" may or may not work. We recommend clicking \"Create and use a new service role\".\n7. Click \"Launch CloudFormation Template\".  This takes 15 to 25 minutes depending on model size.\n8. On the \"Quick create stack\" page, ensure the stack name and endpoint names are not already used. You can check existing stack names [here](https:\u002F\u002Fconsole.aws.amazon.com\u002Fcloudformation\u002Fhome\u002Fstacks) and existing endpoint names [here](https:\u002F\u002Fus-east-1.console.aws.amazon.com\u002Fsagemaker\u002Fhome?region=us-east-1#\u002Fendpoints).\n\nThe SageMaker deployment of the model now lives on a dedicated GPU instance inside your AWS environment, and will be billed directly to your AWS account.\nMake sure to remember to shut down the instance after you stop using it. Find the CloudFormation stack you created [here](https:\u002F\u002Fus-east-1.console.aws.amazon.com\u002Fcloudformation\u002Fhome), select it, and then click \"Delete\" to clean up all resources.\n\nAfter creating the endpoint, you can create a SageMaker client and use it the same way as a Forge client. They share the same API.\nThe local python client talks to the SageMaker endpoint you just deployed, which runs on an instance with a GPU to run model inference.\n\nEnsure that the code below runs in an environment that has AWS credentials available for the account which provisioned SageMaker resources.  Learn more about general AWS credential options [here](https:\u002F\u002Fdocs.aws.amazon.com\u002Fcli\u002Flatest\u002Fuserguide\u002Fcli-chap-authentication.html#cli-chap-authentication-precedence).\n\n```py\nfrom esm.sdk.sagemaker import ESM3SageMakerClient\nfrom esm.sdk.api import ESMProtein, LogitsConfig\n\nsagemaker_client = ESM3SageMakerClient(\n   # E.g. \"Endpoint-ESMC-6B-1\"\n   endpoint_name=SAGE_ENDPOINT_NAME,\n   # E.g. \"esmc-6b-2024-12\". Same model names as in Forge.\n   model=MODEL_NAME,\n)\n\nprotein = ESMProtein(sequence=\"AAAAA\")\nprotein_tensor = sagemaker_client.encode(protein)\nlogits_output = sagemaker_client.logits(\n   protein_tensor, LogitsConfig(sequence=True, return_embeddings=True)\n)\nprint(logits_output.logits, logits_output.embeddings)\n```\n\n### ESM C Example Usage\n \u003Ca name=\"esm-c-example\">\u003C\u002Fa>\n\n[Embedding a sequence using ESM C](.\u002Fcookbook\u002Ftutorials\u002F2_embed.ipynb) [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fevolutionaryscale\u002Fesm\u002Fblob\u002Fmain\u002Fcookbook\u002Ftutorials\u002F2_embed.ipynb)\n\n## Responsible Development \u003Ca name=\"responsible-development\">\u003C\u002Fa>\n\nEvolutionaryScale is a public benefit company. Our mission is to develop artificial intelligence to understand biology for the benefit of human health and society, through partnership with the scientific community, and open, safe, and responsible research. Inspired by the history of our field as well as [new principles and recommendations](https:\u002F\u002Fresponsiblebiodesign.ai\u002F), we have created a Responsible Development Framework to guide our work towards our mission with transparency and clarity.\n\nThe core tenets of our framework are\n\n- We will communicate the benefits and risks of our research\n- We will proactively and rigorously evaluate the risk of our models before public deployment\n- We will adopt risk mitigation strategies and precautionary guardrails\n- We will work with stakeholders in government, policy, and civil society to keep them informed\n\nWith this in mind, we have performed a variety of mitigations for `esm3-sm-open-v1`, detailed in our [paper](https:\u002F\u002Fwww.science.org\u002Fdoi\u002F10.1126\u002Fscience.ads0018)\n\n## Licenses  \u003Ca name=\"licenses\">\u003C\u002Fa>\n\nThe code and model weights of ESM3 and ESM C are available under a mixture of non-commercial and permissive commercial licenses. For complete license details, see [LICENSE.md](.\u002FLICENSE.md).\n\n## Citations  \u003Ca name=\"citations\">\u003C\u002Fa>\nIf you use ESM in your work, please cite one of the following:\n\n#### ESM3\n```\n@article {hayes2024simulating,\n\tauthor = {Hayes, Thomas and Rao, Roshan and Akin, Halil and Sofroniew, Nicholas J. and Oktay, Deniz and Lin, Zeming and Verkuil, Robert and Tran, Vincent Q. and Deaton, Jonathan and Wiggert, Marius and Badkundri, Rohil and Shafkat, Irhum and Gong, Jun and Derry, Alexander and Molina, Raul S. and Thomas, Neil and Khan, Yousuf A. and Mishra, Chetan and Kim, Carolyn and Bartie, Liam J. and Nemeth, Matthew and Hsu, Patrick D. and Sercu, Tom and Candido, Salvatore and Rives, Alexander},\n\ttitle = {Simulating 500 million years of evolution with a language model},\n\tyear = {2025},\n\tdoi = {10.1126\u002Fscience.ads0018},\n\tURL = {http:\u002F\u002Fdx.doi.org\u002F10.1126\u002Fscience.ads0018},\n\tjournal = {Science}\n}\n```\n\n#### ESM C\n```\n@misc{esm2024cambrian,\n  author = {{ESM Team}},\n  title = {ESM Cambrian: Revealing the mysteries of proteins with unsupervised learning},\n  year = {2024},\n  publisher = {EvolutionaryScale Website},\n  url = {https:\u002F\u002Fevolutionaryscale.ai\u002Fblog\u002Fesm-cambrian},\n  urldate = {2024-12-04}\n}\n```\n\n#### ESM Github (Code \u002F Weights)\n```\n@software{evolutionaryscale_2024,\n  author = {{EvolutionaryScale Team}},\n  title = {evolutionaryscale\u002Fesm},\n  year = {2024},\n  publisher = {Zenodo},\n  doi = {10.5281\u002Fzenodo.14219303},\n  URL = {https:\u002F\u002Fdoi.org\u002F10.5281\u002Fzenodo.14219303}\n}\n```\n","Biohub\u002Fesm 项目提供了一套先进的蛋白质模型，包括生成和预测任务的ESM3以及用于蛋白质序列嵌入的ESM C。其核心功能是通过多模态学习方法处理蛋白质的序列、结构和功能三种基本生物特性，支持从大规模生成到精准预测等多种应用场景。技术上，该项目提供了多种大小不同的预训练模型，并可通过简单的pip安装快速部署使用。这些模型适用于生物学研究中的蛋白质设计与分析、药物发现等领域，为科研人员提供强大的工具支持。",2,"2026-06-11 03:56:58","high_star"]