[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11627":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},11627,"genie3","aqlaboratory\u002Fgenie3","aqlaboratory","Genie 3 is a fast, all-atom SE(3)-equivariant diffusion model for protein design. It achieves state-of-the-art performance on unconditional generation, motif scaffolding, and binder design while retaining the computational efficiency of equivariant architectures.","https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.64898\u002F2026.05.01.722168v1",null,"Python",112,18,31,4,0,5,38,15,3.84,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:02:32","# Genie 3: Fast and ultra-capable protein design through all-atom SE(3)-equivariance\n\nGenie 3 is a fast, all-atom SE(3)-equivariant diffusion model for protein design.\nIt achieves state-of-the-art performance on unconditional generation, motif\nscaffolding, and binder design while retaining the computational efficiency of\nequivariant architectures.\n\n> **Preprint:** [Fast and Ultra-Capable Protein Design: Advancing the Frontier Through Atomistic SE(3)-Equivariance with Genie 3](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.64898\u002F2026.05.01.722168v1)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fbinder_design_demo.gif\" alt=\"Binder Design Demo\" \u002F>\n\u003C\u002Fp>\n\n## Table of Contents\n\n- [Setup](#setup)\n  - [Download source code](#download-source-code)\n  - [Create conda environments](#create-conda-environments)\n  - [Download model weights and training data](#download-model-weights-and-training-data)\n- [CLI Reference](#cli-reference)\n  - [Subcommands](#subcommands)\n  - [Shared flags](#shared-flags)\n  - [Multi-node sharding](#multi-node-sharding)\n- [Application 1: Unconditional Generation](#application-1-unconditional-generation)\n  - [Quick start](#quick-start)\n  - [Configuration](#configuration)\n  - [Multi-device (single node)](#multi-device-single-node)\n  - [Multi-node](#multi-node)\n  - [Evaluation results](#evaluation-results)\n- [Application 2: Motif Scaffolding](#application-2-motif-scaffolding)\n  - [Quick start](#quick-start-1)\n  - [Configuration](#configuration-1)\n  - [Multi-device (single node)](#multi-device-single-node-1)\n  - [Multi-node](#multi-node-1)\n  - [Evaluation results](#evaluation-results-1)\n  - [Construction of motif scaffolding problem set](#construction-of-motif-scaffolding-problem-set)\n- [Application 3: Binder Design](#application-3-binder-design)\n  - [Quick start](#quick-start-2)\n  - [Configuration](#configuration-2)\n  - [Multi-device (single node)](#multi-device-single-node-2)\n  - [Multi-node](#multi-node-2)\n  - [Beam search](#beam-search)\n  - [Iterative design](#iterative-design)\n  - [Evaluation results](#evaluation-results-2)\n  - [Construction of binder design problem set](#construction-of-binder-design-problem-set)\n- [Training](#training)\n  - [Dataset](#dataset)\n- [Codebase Architecture](#codebase-architecture)\n  - [`src\u002Fgenie3\u002Fcli.py`: Unified CLI](#srcgenie3clipy-unified-cli)\n  - [`src\u002Fgenie3\u002Fconfig\u002F`: Configuration](#srcgenie3config-configuration)\n  - [`src\u002Fgenie3\u002Fruntime\u002F`: Execution Context](#srcgenie3runtime-execution-context)\n  - [`src\u002Fgenie3\u002Fgeneration\u002F`: Diffusion Model and Sampling](#srcgenie3generation-diffusion-model-and-sampling)\n  - [`src\u002Fgenie3\u002Fevaluation\u002F`: Evaluation Pipeline](#srcgenie3evaluation-evaluation-pipeline)\n- [Compatibility with Genie 2](#compatibility-with-genie-2)\n- [Citation](#citation)\n\n---\n\n## Setup\n\n### Download source code\n\nDownload the source code from GitHub by cloning this repo:\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Faqlaboratory\u002Fgenie3.git\n```\n\n### Create conda environments\n\nInstall and set up the unified conda environment for all workflows:\n\n```\nbash scripts\u002Fsetup\u002Fsetup.sh\n```\n\nThis creates a single default environment named `genie3` with everything needed\nfor training, unconditional generation, motif scaffolding, and binder design.\nColabFold is installed directly into the `genie3` environment using the\nofficial upstream `conda` + `pip` installation model.\n\n### Download model weights and training data\n\nPretrained model weights and training datasets are hosted on\n[HuggingFace (yeqinglin\u002Fgenie3)](https:\u002F\u002Fhuggingface.co\u002Fyeqinglin\u002Fgenie3).\nUse the download script to fetch them:\n\n```bash\n# Download both pretrained weights and training data (default)\nbash scripts\u002Fsetup\u002Fdownload.sh\n\n# Pretrained weights only (pretrained\u002F)\nbash scripts\u002Fsetup\u002Fdownload.sh --weights\n\n# Training data only (data\u002Ftrain\u002F)\nbash scripts\u002Fsetup\u002Fdownload.sh --data\n```\n\n`huggingface_hub` is installed automatically as part of the `genie3` package.\n\n---\n\n## CLI Reference\n\nAll workflows are driven by the `genie3` command-line tool. Make sure the\n`genie3` conda environment is active before running any command:\n\n```\nconda activate genie3\n```\n\n### Subcommands\n\n| Command | Description |\n|---------|-------------|\n| `genie3 run -c \u003CCONFIG>` | Run generation followed by evaluation (all-in-one) |\n| `genie3 generate -c \u003CCONFIG>` | Run generation only |\n| `genie3 evaluate -c \u003CCONFIG>` | Run evaluation only |\n| `genie3 evaluate --reduce -c \u003CCONFIG>` | Merge shard outputs and produce final results |\n| `genie3 status -c \u003CCONFIG>` | Show progress of generation and evaluation shards |\n| `genie3 train --config \u003CCONFIG> -d \u003CN>` | Run model training |\n\n### Shared flags\n\n| Flag | Description |\n|------|-------------|\n| `-c \u002F --config \u003CPATH>` | Path to the experiment configuration file (required) |\n| `--num-devices \u003CN>` | Number of GPUs to use |\n| `--verbose` | Show detailed runtime output in the terminal |\n| `--log-dir \u003CPATH>` | Root directory for run logs (default: `logs\u002Fruns`) |\n\n### Multi-node sharding\n\nFor `generate` and `evaluate`, add `--shard-id` and `--num-shards` to split\nwork across nodes:\n\n```bash\n# On node 0:\ngenie3 generate -c \u003CCONFIG> --shard-id 0 --num-shards \u003CN>\n# On node 1:\ngenie3 generate -c \u003CCONFIG> --shard-id 1 --num-shards \u003CN>\n# ... (one process per node)\n\n# After all shards complete, run the reduce step once:\ngenie3 evaluate --reduce -c \u003CCONFIG>\n```\n\nUse `genie3 status -c \u003CCONFIG>` at any time to check which shards have\ncompleted and which still need to run.\n\n---\n\n## Application 1: Unconditional Generation\n\n### Quick start\n\n```bash\ngenie3 run -c examples\u002Funconditional\u002Fexperiment.yaml\n```\n\n### Configuration\n\nThe experiment configuration file uses the following format:\n\n```yaml\nexperiment:\n  name: \u003CEXPERIMENT_NAME>\n\npaths:\n  rootdir: \u003COUTDIR>\n\ngeneration:\n  dataset:\n    source: unconditional\n    min_length: \u003CMIN_LENGTH>\n    max_length: \u003CMAX_LENGTH>\n    length_step: \u003CLENGTH_STEP>\n    n_sample: \u003CNUM_SAMPLES>\n  sampler:\n    sampler:\n      direction_scale: \u003CDIRECTION_SCALE>\n\nevaluation:\n  version: unconditional\n  folding:\n    model_name: esmfold\n```\n\nParameters:\n- `\u003COUTDIR>`: Root output directory for generated designs and evaluation results\n- `\u003CMIN_LENGTH>` \u002F `\u003CMAX_LENGTH>`: Length range of generated proteins (inclusive)\n- `\u003CLENGTH_STEP>`: Increment between sampled lengths\n- `\u003CNUM_SAMPLES>`: Number of samples to generate per length\n- `\u003CDIRECTION_SCALE>`: Controls quality-diversity trade-off. Use `0.8` for short\n  monomers (length ≤ 300) and `0.0` for long monomers (length > 300)\n\nSee `examples\u002Funconditional\u002Fexperiment.yaml` for a working example.\n\n### Multi-device (single node)\n\nUse `--num-devices` to run generation and evaluation across multiple GPUs on one\nnode:\n\n```bash\ngenie3 run -c \u003CCONFIG> --num-devices \u003CN>\n```\n\nOr run each stage separately:\n\n```bash\ngenie3 generate -c \u003CCONFIG> --num-devices \u003CN>\ngenie3 evaluate -c \u003CCONFIG> --num-devices \u003CN>\ngenie3 evaluate --reduce -c \u003CCONFIG>\n```\n\nPass `--num-devices 4` to use 4 GPUs.\n\n### Multi-node\n\nDistribute generation and evaluation across multiple nodes using sharding:\n\n```bash\n# Generation — one process per node:\ngenie3 generate -c \u003CCONFIG> --num-devices \u003CN> --shard-id \u003CID> --num-shards \u003CTOTAL>\n\n# Evaluation — one process per node:\ngenie3 evaluate -c \u003CCONFIG> --num-devices \u003CN> --shard-id \u003CID> --num-shards \u003CTOTAL>\n\n# Reduce — run once after all evaluation shards complete:\ngenie3 evaluate --reduce -c \u003CCONFIG>\n```\n\n### Evaluation results\n\nResults are written to `\u003COUTDIR>\u002Fresults`. The directory contains:\n\n- `info.csv`: Evaluation statistics for each generated design, including\n  self-consistency RMSD (`scrmsd`), ESMFold average per-residue confidence\n  (`avg_plddt`), and secondary structure content (`pct_alpha_helix`, `pct_strand`).\n- `successful_generation_info.csv`: Statistics for in-silico successful designs.\n- `successful_generations_cluster.csv`: FoldSeek clustering results for\n  successful designs, with cluster IDs at TM-score thresholds of 0.5, 0.6, and 0.8.\n- `successful_generations\u002F`: PDB files of in-silico successful designs.\n\nA design is considered an in-silico success if `scrmsd \u003C 2Å`.\n\n---\n\n## Application 2: Motif Scaffolding\n\n### Quick start\n\n```bash\ngenie3 run -c examples\u002Fmotif_scaffolding\u002Fexperiment.yaml\n```\n\n### Configuration\n\n```yaml\nexperiment:\n  name: \u003CEXPERIMENT_NAME>\n\npaths:\n  rootdir: \u003COUTDIR>\n  dataset: \u003CDATADIR>\n\ngeneration:\n  dataset:\n    source: motif\n    selections: \u003CSELECTIONS>   # optional: comma-separated problem names\n    n_sample: \u003CNUM_SAMPLES>\n  sampler:\n    sampler:\n      direction_scale: 0.1\n\nevaluation:\n  version: scaffold\n  folding:\n    model_name: esmfold\n```\n\nParameters:\n- `\u003COUTDIR>`: Root output directory\n- `\u003CDATADIR>`: Path to the motif scaffolding problem set directory (see\n  [Construction of motif scaffolding problem set](#construction-of-motif-scaffolding-problem-set))\n- `\u003CSELECTIONS>`: (Optional) Comma-separated list of problem names to sample from.\n  If omitted, all problems in the dataset are used.\n- `\u003CNUM_SAMPLES>`: Number of samples to generate per problem\n\nSee `examples\u002Fmotif_scaffolding\u002Fexperiment.yaml` for a working example.\n\n### Multi-device (single node)\n\n```bash\ngenie3 run -c \u003CCONFIG> --num-devices \u003CN>\n```\n\nOr run each stage separately:\n\n```bash\ngenie3 generate -c \u003CCONFIG> --num-devices \u003CN>\ngenie3 evaluate -c \u003CCONFIG> --num-devices \u003CN>\ngenie3 evaluate --reduce -c \u003CCONFIG>\n```\n\nPass `--num-devices 4` to use 4 GPUs.\n\n### Multi-node\n\n```bash\n# Generation — one process per node:\ngenie3 generate -c \u003CCONFIG> --num-devices \u003CN> --shard-id \u003CID> --num-shards \u003CTOTAL>\n\n# Evaluation — one process per node:\ngenie3 evaluate -c \u003CCONFIG> --num-devices \u003CN> --shard-id \u003CID> --num-shards \u003CTOTAL>\n\n# Reduce — run once after all evaluation shards complete:\ngenie3 evaluate --reduce -c \u003CCONFIG>\n```\n\n### Evaluation results\n\nResults are written to `\u003COUTDIR>\u002F\u003CPROBLEM_NAME>\u002Fresults` for each problem.\nEach `results` directory contains:\n\n- `info.csv`: Evaluation statistics including self-consistency RMSD (`scrmsd`),\n  ESMFold confidence (`avg_plddt`), secondary structure content, and motif\n  consistency metrics (`motif_ca_rmsd`, `motif_bb_rmsd`, `motif_aa_rmsd` —\n  measuring RMSD of the generated motif to the target in terms of Cα atoms,\n  backbone atoms, and all heavy atoms, respectively). For multi-motif\n  scaffolding, each metric is reported as the maximum across all motif segments.\n- `successful_{backbone\u002Fallatom\u002Fallatom_strict}_generation_info.csv`: Statistics\n  for in-silico successful designs under each criterion.\n- `successful_{backbone\u002Fallatom\u002Fallatom_strict}_generations_cluster.csv`:\n  FoldSeek clustering results at TM-score thresholds of 0.5, 0.6, and 0.8.\n- `successful_{backbone\u002Fallatom\u002Fallatom_strict}_generations\u002F`: PDB files of\n  in-silico successful designs.\n\nSuccess criteria:\n- **Backbone success**: `scrmsd \u003C 2Å` and `motif_ca_rmsd \u003C 2Å`\n- **All-atom success**: `scrmsd \u003C 2Å` and `motif_aa_rmsd \u003C 2Å`\n- **All-atom strict success**: `scrmsd \u003C 2Å` and `motif_aa_rmsd \u003C 1Å`\n\n### Construction of motif scaffolding problem set\n\nWe provide an example motif scaffolding problem set at\n`data\u002Fdesign\u002Fmotif_scaffolding\u002Fmotifbench`. It consists of a `problems\u002F`\ndirectory and a `motifs\u002F` directory. Each motif structure file uses the\nfollowing header format to define the motif name and segments (chain ID and\nresidue range). Each `REMARK 999 INPUT` line defines one motif segment; the\nline order defines the segment index (starting from 1).\n\n```\nREMARK 999 NAME   \u003CPROBLEM_NAME>\nREMARK 999 INPUT  \u003CCHAIN_ID> \u003CSTART_RESIDUE_INDEX> \u003CEND_RESIDUE_INDEX>\nREMARK 999 INPUT  \u003CCHAIN_ID> \u003CSTART_RESIDUE_INDEX> \u003CEND_RESIDUE_INDEX>\n```\n\nThis is followed by ATOM lines describing the motif structure. An example can\nbe found at `data\u002Fdesign\u002Fmotif_scaffolding\u002Fmotifbench\u002Fmotifs\u002F01_1LDB.pdb`.\n\nThe `problems\u002F` directory contains problem definition JSON files:\n\n```json\n{\n    \"motif_filepaths\": [\n        \"\u003CPATH_TO_MOTIF_STRUCTURE>\"\n    ],\n    \"segment_config_str\": \"\u003CSEGMENT_1>,\u003CSCAFFOLD_MIN_LENGTH>-\u003CSCAFFOLD_MAX_LENGTH>,\u003CSEGMENT_2>,...\",\n    \"maximum_total_length\": 125,\n    \"minimum_total_length\": 125\n}\n```\n\nMotifs in `motif_filepaths` are indexed starting from `A` (first), `B`\n(second), etc. Each segment ID in `segment_config_str` concatenates the motif\nletter with the segment index from that motif's structure file (e.g. `A3` is\nthe third segment of the first motif). Scaffold ranges (e.g. `8-15`) specify\nthe min–max length (inclusive) for each flanking scaffold region.\n\n#### Single-motif scaffolding example: MotifBench\u002F22_1BCF\n\nProblem definition (`data\u002Fdesign\u002Fmotif_scaffolding\u002Fmotifbench\u002Fproblems\u002F22_1BCF.json`):\n\n```json\n{\n    \"motif_filepaths\": [\n        \"data\u002Fdesign\u002Fmotif_scaffolding\u002Fmotifbench\u002Fmotifs\u002F22_1BCF.pdb\"\n    ],\n    \"segment_config_str\": \"8-15,A3,16-30,A4,16-30,A2,16-30,A1,8-15\",\n    \"maximum_total_length\": 125,\n    \"minimum_total_length\": 125\n}\n```\n\nMotif structure header (`data\u002Fdesign\u002Fmotif_scaffolding\u002Fmotifbench\u002Fmotifs\u002F22_1BCF.pdb`):\n\n```\nREMARK 999 NAME   22_1BCF\nREMARK 999 PDB    1BCF\nREMARK 999 INPUT  A   1   8\nREMARK 999 INPUT  A  29  36\nREMARK 999 INPUT  A  57  64\nREMARK 999 INPUT  A  85  92\n```\n\nHere `A3` → residues A57–64, `A4` → A85–92, `A2` → A29–36, `A1` → A1–8.\nThe scaffold lengths (e.g. `8-15`, `16-30`) flank each motif segment in the\norder given by `segment_config_str`.\n\n#### Multi-motif scaffolding example: RSVF\n\nProblem definition (`data\u002Fdesign\u002Fmotif_scaffolding\u002Frsvf\u002Fproblems\u002F03_425.json`):\n\n```json\n{\n    \"motif_filepaths\": [\n        \"data\u002Fdesign\u002Fmotif_scaffolding\u002Frsvf\u002Fmotifs\u002Fsite_iv.pdb\",\n        \"data\u002Fdesign\u002Fmotif_scaffolding\u002Frsvf\u002Fmotifs\u002Fsite_ii.pdb\",\n        \"data\u002Fdesign\u002Fmotif_scaffolding\u002Frsvf\u002Fmotifs\u002Fsite_v.pdb\"\n    ],\n    \"segment_config_str\": \"0-30,A1,0-30,B1,0-30,C1,0-30\"\n}\n```\n\n`A1`, `B1`, `C1` refer to the first motif segment in each respective motif\nfile. The `0-30` ranges allow flexible scaffold lengths between each segment.\n\n---\n\n## Application 3: Binder Design\n\n### Quick start\n\n```bash\ngenie3 run -c examples\u002Fbinder_design\u002Fexperiment.yaml\n```\n\n### Configuration\n\n```yaml\nexperiment:\n  name: \u003CEXPERIMENT_NAME>\n\npaths:\n  rootdir: \u003COUTDIR>\n  dataset: \u003CDATADIR>\n\ngeneration:\n  dataset:\n    source: target\n    selections: \u003CSELECTIONS>   # optional: comma-separated problem names\n    n_sample: \u003CNUM_SAMPLES>\n  sampler:\n    sampler:\n      direction_scale: 0.0\n\nevaluation:\n  version: binder\n  inverse_folding:\n    num_seq: 1\n  folding:\n    model_name: colabfold\n    mode: template             # \"template\" or \"msa\"\n```\n\nParameters:\n- `\u003COUTDIR>`: Root output directory\n- `\u003CDATADIR>`: Path to the binder design problem set directory (see\n  [Construction of binder design problem set](#construction-of-binder-design-problem-set))\n- `\u003CSELECTIONS>`: (Optional) Comma-separated list of problem names. If omitted,\n  all problems in the dataset are used.\n- `\u003CNUM_SAMPLES>`: Number of binder candidates to generate per problem\n- Folding modes:\n  - `template`: Structure prediction without MSA, target structure passed as\n    template\n  - `msa`: Structure prediction using MSA of the target sequence only\n\nSee `examples\u002Fbinder_design\u002Fexperiment.yaml` for a working example.\n\n### Multi-device (single node)\n\n```bash\ngenie3 run -c \u003CCONFIG> --num-devices \u003CN>\n```\n\nOr run each stage separately:\n\n```bash\ngenie3 generate -c \u003CCONFIG> --num-devices \u003CN>\ngenie3 evaluate -c \u003CCONFIG> --num-devices \u003CN>\ngenie3 evaluate --reduce -c \u003CCONFIG>\n```\n\nPass `--num-devices 4` to use 4 GPUs.\n\n### Multi-node\n\n```bash\n# Generation — one process per node:\ngenie3 generate -c \u003CCONFIG> --num-devices \u003CN> --shard-id \u003CID> --num-shards \u003CTOTAL>\n\n# Evaluation — one process per node:\ngenie3 evaluate -c \u003CCONFIG> --num-devices \u003CN> --shard-id \u003CID> --num-shards \u003CTOTAL>\n\n# Reduce — run once after all evaluation shards complete:\ngenie3 evaluate --reduce -c \u003CCONFIG>\n```\n\nUse `genie3 status -c \u003CCONFIG>` to track shard progress across nodes.\n\n### Beam search\n\nBeam search improves design quality by branching N parallel diffusion\ntrajectories, evaluating them via ColabFold at each checkpoint, and keeping\nthe top N. To enable beam search, add an `inference.search` block:\n\n```yaml\ngeneration:\n  compile: true          # enable torch.compile for the denoiser\n  dataset:\n    source: target\n    n_sample: \u003CNUM_SAMPLES>\n  inference:\n    sampler:\n      sampler:\n        direction_scale: 0.0\n    search:\n      name: beam\n    reward:\n      name: colabfold\n```\n\nSee `examples\u002Fbinder_design\u002Fexperiment_beam.yaml` for a working example.\n\n### Iterative design\n\nIterative design runs multiple rounds of generation and evaluation. Each round\ncan use the prior rounds' successful complexes to refine the conditioning\ninterface. Add a `rounds` block to the configuration:\n\n```yaml\nrounds:\n  - id: round_0\n    cond_strategy: extended     # use predefined extended interface\n\n  - id: round_1\n    cond_strategy: iter_common  # compute common interface from round_0 successes\n```\n\nAvailable `cond_strategy` values:\n- `hotspot`: Use the hotspot interface residues from the problem JSON\n- `extended`: Use the extended interface residues from the problem JSON\n- `common`: Use a user-provided common interface from the problem JSON\n- `iter_common`: Compute the common interface (intersection) from all prior rounds'\n  `v0_success` complexes; result is cached in `{rootdir}\u002Fproblems\u002F{problem}.json`\n- `iter_common_prob`: Like `iter_common` but uses **probabilistic interface conditioning**:\n  each trajectory independently samples residues proportional to how often they appeared\n  in prior successful designs. Hotspot residues are always included.\n\nRe-running `genie3 run -c \u003CCONFIG>` after an interruption automatically resumes\nfrom the last incomplete round.\n\nSee `examples\u002Fbinder_design\u002Fexperiment_iterative.yaml` for a complete example.\n\n### Evaluation results\n\nResults are written to `\u003COUTDIR>\u002F\u003CPROBLEM_NAME>\u002Fresults` for each problem.\nEach `results` directory contains:\n\n- `info.csv`: Evaluation statistics for each generated design, including\n  self-consistency RMSD between the Genie 3 backbone and the ColabFold-predicted\n  complex (`complex_scrmsd`), predicted TM-score for the binder (`binder_ptm`),\n  and minimum interaction predicted Aligned Error between binder and target\n  (`min_interaction_pae`). Lower `min_interaction_pae` indicates higher\n  confidence in the predicted binding interface.\n- `log.txt`: Summary counts for each filter set.\n- `v0_success\u002F`: Directory containing in-silico successful designs after\n  applying Version 0 Filters:\n  - `success_info.csv`: Statistics for each successful design.\n  - `successful_incomplex_binders_cluster.csv`: FoldSeek clustering results at\n    TM-score thresholds of 0.5, 0.6, and 0.8.\n  - `successful_incomplex_binders\u002F`: PDB files of binders extracted from the\n    predicted complex.\n  - `successful_complexes\u002F`: PDB files of the full predicted complex.\n\n**Version 0 Filters:**\n- Model agreement: `complex_scrmsd \u003C 2.5Å`\n- In-silico binder quality: `binder_ptm > 0.8` and `min_interface_pae \u003C 1.5Å`\n- Hotspot coverage: full coverage for ≤ 3 hotspots; ≥ 0.8 coverage otherwise\n\n### Construction of binder design problem set\n\nWe provide an example binder design problem set at\n`scripts\u002Fproblem\u002Fbinder_design\u002Fbinderbench`. It includes a configuration YAML\nand a directory of target PDB structures. The configuration format is:\n\n```yaml\nname: binderbench     # Problem set name\nproblem:\n\n  01_bhrf1:           # Problem key\n    name: BHRF1       # Problem display name\n    target:\n      filepath: scripts\u002Fproblem\u002Fbinder_design\u002Fbinderbench\u002Fpdb\u002F01_bhrf1.pdb\n      hotspot:         # Hotspot residues on the target (chain + residue index)\n        A65\n        A74\n        A77\n        A82\n        A85\n        A93\n    binder:\n      min_length: 80\n      max_length: 120\n    tag:               # (Optional) tag for partial problem selection\n      AlphaProteo\n    other:             # (Optional) extra metadata\n      pdb_id: 2WH6\n```\n\nOnce defined, generate the processed problem set:\n\n```bash\npython scripts\u002Fproblem\u002Fbinder_design\u002Fprepare.py \\\n    --config \u003CCONFIG_FILEPATH> \\\n    --outdir \u003COUTPUT_DIRECTORY>\n```\n\nThis script:\n1. Validates the target PDB file (no alternative positions or insertion codes)\n2. Constructs an MSA for the target sequence via the ColabFold MSA server\n3. Formats FASTA and PDB files for the target\n4. Writes a problem definition JSON compatible with Genie 3\n\nOutput layout:\n\n```\n\u003COUTPUT_DIRECTORY>\u002F\n  binderbench\u002F\n    problems\u002F\n      01_bhrf1.json\n      ...\n    targets\u002F\n      pdb\u002F\n        01_bhrf1.pdb\n      fasta\u002F\n        01_bhrf1.fasta\n      msa\u002F\n        01_bhrf1.a3m\n```\n\nSet `paths.dataset` in the experiment config to the problem set directory\n(`\u003COUTPUT_DIRECTORY>\u002Fbinderbench` in this example). A pre-processed example is\navailable at `data\u002Fdesign\u002Fbinder_design\u002Fbinderbench`.\n\nAn example problem definition JSON:\n\n```json\n{\n    \"key\": \"01_bhrf1\",\n    \"name\": \"BHRF1\",\n    \"target_pdb_filepath\": \"\u003COUTPUT_DIRECTORY>\u002Fbinderbench\u002Ftargets\u002Fpdb\u002F01_bhrf1.pdb\",\n    \"target_fasta_filepath\": \"\u003COUTPUT_DIRECTORY>\u002Fbinderbench\u002Ftargets\u002Ffasta\u002F01_bhrf1.fasta\",\n    \"target_msa_filepath\": \"\u003COUTPUT_DIRECTORY>\u002Fbinderbench\u002Ftargets\u002Fmsa\u002F01_bhrf1.a3m\",\n    \"target_chain_and_residues\": [\"A2-158\"],\n    \"target_interface_residues\": {\n        \"hotspot\": [\"A65\", \"A74\", \"A77\", \"A82\", \"A85\", \"A93\"],\n        \"extended\": [\n            \"A61\", \"A62\", \"A63\", \"A64\", \"A65\", \"A66\",\n            \"A67\", \"A68\", \"A70\", \"A71\", \"A74\", \"A75\",\n            \"A77\", \"A78\", \"A80\", \"A81\", \"A82\", \"A84\",\n            \"A85\", \"A86\", \"A88\", \"A89\", \"A92\", \"A93\",\n            \"A94\", \"A95\", \"A100\", \"A103\"\n        ]\n    },\n    \"binder_min_length\": 80,\n    \"binder_max_length\": 120,\n    \"tag\": [\"AlphaProteo\"],\n    \"pdb_id\": \"2WH6\"\n}\n```\n\n> **Note:** The first chain in any binder PDB file must be the binder chain.\n\n---\n\n## Training\n\nModel training uses the `genie3 train` command with a training configuration\nfile:\n\n```bash\ngenie3 train --config \u003CCONFIG> --devices \u003CN> [--num-nodes \u003CM>]\n```\n\n| Flag | Description |\n|------|-------------|\n| `--config \u003CPATH>` | Path to training configuration YAML |\n| `-d \u002F --devices \u003CN>` | Number of GPU devices per node (required) |\n| `-n \u002F --num-nodes \u003CM>` | Number of compute nodes (default: 1) |\n| `-t \u002F --test` | Disable remote logging (W&B) for local runs |\n| `--mpi-plugin` | Enable the MPI environment plugin for distributed training |\n| `--memory-snapshot` | Enable CUDA memory snapshot collection for debugging |\n| `--reset-dataloader-state` | Reset dataloader checkpoint state before training |\n\n### Dataset\n\nTraining data is hosted on\n[HuggingFace (yeqinglin\u002Fgenie3)](https:\u002F\u002Fhuggingface.co\u002Fyeqinglin\u002Fgenie3).\nDownload it with:\n\n```bash\nbash scripts\u002Fsetup\u002Fdownload.sh --data\n```\n\nThis places dataset manifests under `data\u002Ftrain\u002F`:\n\n| Dataset | Path |\n|---------|------|\n| AlphaFold DB representatives (L ≤ 512, pLDDT ≥ 70) | `data\u002Ftrain\u002Fafdbreps_l-512_plddt-70\u002Finfo.csv` |\n| PiNDER (2024-02) | `data\u002Ftrain\u002Fpinder\u002F2024-02\u002Finfo.csv` |\n\n---\n\n## Codebase Architecture\n\nThe `src\u002Fgenie3\u002F` package is the unified entry point for all Genie 3 workflows.\n\n### `src\u002Fgenie3\u002Fcli.py`: Unified CLI\n\nThe thin command-line dispatcher. Parses subcommands (`run`, `generate`,\n`evaluate`, `status`, `train`) and delegates to the workflow modules.\nThe `run` subcommand orchestrates `generate` + `evaluate` in child processes\nso GPU memory is fully released between stages.\n\n### `src\u002Fgenie3\u002Fconfig\u002F`: Configuration\n\nLoads and validates `experiment.yaml` files. Key components:\n- **`models.py`**: Frozen dataclasses for each config section\n  (`ExperimentConfig`, `PathsConfig`, `GenerationConfig`, `EvaluationConfig`,\n  `RuntimeConfig`, `RoundConfig`)\n- **`loader.py`**: `load_experiment_config` — parses YAML and returns a typed\n  `ExperimentRunConfig`; `to_generation_config` and `to_evaluation_kwargs` —\n  convert unified config to the format expected by the generation and evaluation\n  pipelines\n- **`schema.py`**: Validation helpers and `ConfigError`\n\n### `src\u002Fgenie3\u002Fruntime\u002F`: Execution Context\n\nManages the runtime lifecycle for each CLI invocation:\n- **`context.py`**: `RunContext` dataclass + `create_run_context` context\n  manager — sets up structured logging, creates the run directory, and wires up\n  progress reporting\n- **`logging.py`**: Configures per-run file and terminal log handlers\n- **`profile.py`**: `RuntimeProfile` — lightweight stage-level timing\n- **`progress.py`**: `ProgressReporter` — terminal status line updates\n\n### `src\u002Fgenie3\u002Fgeneration\u002F`: Diffusion Model and Sampling\n\nThe core protein design model and sampling infrastructure:\n- **`workflow.py`**: `run_generation` and `run_training` — CLI-facing entry\n  points\n- **`model\u002F`**: SE(3)-equivariant neural network architecture (transformer\n  blocks, TriUpdate, backbone update modules, sequence and structure networks)\n- **`diffusion\u002F`**: Diffusion processes — noise schedules, DDPM\u002FDDIM samplers,\n  beam search (`diffusion\u002Fsearch\u002Fbeam.py`), and ColabFold-based reward models\n  (`diffusion\u002Freward\u002F`)\n- **`data\u002F`**: Dataset loading and preprocessing for AFDB, DDI, Pinder, and\n  other training corpora; feature pipelines for motif, binder, sidechain, and\n  unconditional conditioning\n- **`config\u002F`**: Dataclass-based configuration for models, samplers, datasets,\n  and diffusion hyperparameters\n- **`utils\u002F`**: Geometric utilities, loss functions (FAPE, MSE), PDB I\u002FO, and\n  encoding helpers\n- **`np\u002F`**: Protein constants and numpy-based residue\u002Fatom utilities\n\n### `src\u002Fgenie3\u002Fevaluation\u002F`: Evaluation Pipeline\n\nOrchestrates inverse folding, structure prediction, and metric computation:\n- **`workflow.py`**: `run_evaluation` — CLI-facing entry point; handles shard\n  dispatch and the final reduce step\n- **`pipeline.py`**: `Runner` — core per-shard evaluation loop (sanitize →\n  inverse fold → structure prediction → collect)\n- **`mapper.py`**: `Mapper` — applies ProteinMPNN inverse folding to generated\n  backbones, then runs ColabFold\u002FESMFold on the resulting sequences\n- **`model\u002Ffold\u002F`**: Wrappers for ESMFold, ColabFold, and Boltz2\n- **`model\u002Finverse_fold\u002F`**: Wrappers for ProteinMPNN and forward-fold models\n- **`reducer\u002F`**: Task-specific metric compilation and filtering —\n  `UnconditionalReducer`, `ScaffoldReducer`, `BinderReducer`\n- **`utils\u002F`**: Parsing, clustering (FoldSeek), MSA utilities, interface\n  detection, and metric computation\n\n---\n\n## Compatibility with Genie 2\n\nGenie 2 is the predecessor backbone-only (Cα-trace) diffusion model.\nGenie 3 can load Genie 2 checkpoints directly using the legacy model config.\nExample configs are provided for unconditional generation and motif scaffolding:\n\n- `examples\u002Funconditional\u002Fexperiment_legacy.yaml`\n- `examples\u002Fmotif_scaffolding\u002Fexperiment_legacy.yaml`\n\nRun them the same way as any other experiment:\n\n```bash\ngenie3 run -c examples\u002Funconditional\u002Fexperiment_legacy.yaml\n```\n\n---\n\n## Citation\n\nIf you use Genie 3 in your work, please cite:\n\n```bibtex\n@article{lin2026genie3,\n  title   = {Fast and Ultra-Capable Protein Design: Advancing the Frontier\n             Through Atomistic SE(3)-Equivariance with Genie 3},\n  author  = {Lin, Yeqing and Lee, Minji and Vermani, Aakarsh and Jiang, Ellena\n             and {De Cooman}, Sebastiaan and Spetko, Matej and AlQuraishi, Mohammed},\n  journal = {bioRxiv},\n  year    = {2026},\n  doi     = {10.64898\u002F2026.05.01.722168},\n  url     = {https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.64898\u002F2026.05.01.722168v1}\n}\n```\n","Genie 3 是一个用于蛋白质设计的快速全原子 SE(3) 等变扩散模型。该项目通过先进的等变架构实现了无条件生成、基序支架构建和结合物设计上的顶尖性能，同时保持了计算效率。使用 Python 编写，支持多节点并行处理以加速训练与推理过程。适用于需要高效准确地进行蛋白质结构预测或设计的研究场景，如药物开发中的靶点识别、生物分子工程等领域。",2,"2026-06-11 03:32:10","CREATED_QUERY"]