[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74090":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},74090,"le-wm","lucas-maes\u002Fle-wm","lucas-maes","Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels","https:\u002F\u002Fle-wm.github.io\u002F",null,"Python",3790,511,43,22,0,56,148,551,168,110.13,"MIT License",false,"main",true,[],"2026-06-12 04:01:13","\n# LeWorldModel\n### Stable End-to-End Joint-Embedding Predictive Architecture from Pixels\n\n[Lucas Maes*](https:\u002F\u002Fx.com\u002Flucasmaes_), [Quentin Le Lidec*](https:\u002F\u002Fquentinll.github.io\u002F), [Damien Scieur](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=hNscQzgAAAAJ&hl=fr), [Yann LeCun](https:\u002F\u002Fyann.lecun.com\u002F) and [Randall Balestriero](https:\u002F\u002Frandallbalestriero.github.io\u002F)\n\n**Abstract:** Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pretrained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48× faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events.\n\n\u003Cp align=\"center\">\n   \u003Cb>[ \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2603.19312v1\">Paper\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fquentinll\u002Flewm\">Checkpoints &amp; Data\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fle-wm.github.io\u002F\">Website\u003C\u002Fa> ]\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003Cbr>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Flewm.gif\" width=\"80%\">\n\u003C\u002Fp>\n\nIf you find this code useful, please reference it in your paper:\n```\n@article{maes_lelidec2026lewm,\n  title={LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels},\n  author={Maes, Lucas and Le Lidec, Quentin and Scieur, Damien and LeCun, Yann and Balestriero, Randall},\n  journal={arXiv preprint},\n  year={2026}\n}\n```\n\n## Using the code\nThis codebase builds on [stable-worldmodel](https:\u002F\u002Fgithub.com\u002Fgalilai-group\u002Fstable-worldmodel) for environment management, planning, and evaluation, and [stable-pretraining](https:\u002F\u002Fgithub.com\u002Fgalilai-group\u002Fstable-pretraining) for training. Together they reduce this repository to its core contribution: the model architecture and training objective.\n\n**Installation:**\n```bash\nuv venv --python=3.10\nsource .venv\u002Fbin\u002Factivate\nuv pip install stable-worldmodel[train,env]\n```\n\n## Data\n\nDatasets use the HDF5 format for fast loading. Download the data from [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fquentinll\u002Flewm) and decompress with:\n\n```bash\ntar --zstd -xvf archive.tar.zst\n```\n\nPlace the extracted `.h5` files under `$STABLEWM_HOME` (defaults to `~\u002F.stable-wm\u002F`). You can override this path:\n```bash\nexport STABLEWM_HOME=\u002Fpath\u002Fto\u002Fyour\u002Fstorage\n```\n\nDataset names are specified without the `.h5` extension. For example, `config\u002Ftrain\u002Fdata\u002Fpusht.yaml` references `pusht_expert_train`, which resolves to `$STABLEWM_HOME\u002Fpusht_expert_train.h5`.\n\n## Training\n\n`jepa.py` contains the PyTorch implementation of LeWM. Training is configured via [Hydra](https:\u002F\u002Fhydra.cc\u002F) config files under `config\u002Ftrain\u002F`.\n\nBefore training, set your WandB `entity` and `project` in `config\u002Ftrain\u002Flewm.yaml`:\n```yaml\nwandb:\n  config:\n    entity: your_entity\n    project: your_project\n```\n\nTo launch training:\n```bash\npython train.py data=pusht\n```\n\nCheckpoints are saved to `$STABLEWM_HOME` upon completion.\n\nFor baseline scripts, see the stable-worldmodel [scripts](https:\u002F\u002Fgithub.com\u002Fgalilai-group\u002Fstable-worldmodel\u002Ftree\u002Fmain\u002Fscripts\u002Ftrain) folder.\n\n## Planning\n\nEvaluation configs live under `config\u002Feval\u002F`. Set the `policy` field to the checkpoint path **relative to `$STABLEWM_HOME`**, without the `_object.ckpt` suffix:\n\n```bash\n# ✓ correct\npython eval.py --config-name=pusht.yaml policy=pusht\u002Flewm\n\n# ✗ incorrect\npython eval.py --config-name=pusht.yaml policy=pusht\u002Flewm_object.ckpt\n```\n\n## Pretrained Checkpoints\n\nPretrained LeWM checkpoints for each environment are mirrored on the Hugging Face\nHub (model repos), alongside the datasets (dataset repos) in the same collection:\n\n- [`quentinll\u002Flewm-pusht`](https:\u002F\u002Fhuggingface.co\u002Fquentinll\u002Flewm-pusht)\n- [`quentinll\u002Flewm-cube`](https:\u002F\u002Fhuggingface.co\u002Fquentinll\u002Flewm-cube)\n- [`quentinll\u002Flewm-tworooms`](https:\u002F\u002Fhuggingface.co\u002Fquentinll\u002Flewm-tworooms)\n- [`quentinll\u002Flewm-reacher`](https:\u002F\u002Fhuggingface.co\u002Fquentinll\u002Flewm-reacher)\n\nThe full baseline checkpoint suite (PLDM, LeJEPA, IVL, IQL, GCBC, DINO-WM, DINO-WM-noprop)\nis available on [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1r31os0d4-rR0mdHc7OlY_e5nh3XT4r4e):\n\n\u003Cdiv align=\"center\">\n\n| Method | two-room | pusht | cube | reacher |\n|:---:|:---:|:---:|:---:|:---:|\n| pldm | ✓ | ✓ | ✓ | ✓ |\n| lejepa | ✓ | ✓ | ✓ | ✓ |\n| ivl | ✓ | ✓ | ✓ | — |\n| iql | ✓ | ✓ | ✓ | — |\n| gcbc | ✓ | ✓ | ✓ | — |\n| dinowm | ✓ | ✓ | — | — |\n| dinowm_noprop | ✓ | ✓ | ✓ | ✓ |\n\n\u003C\u002Fdiv>\n\n## Loading a checkpoint\n\n### From the Drive archive\n\nEach tar archive contains two files per checkpoint:\n- `\u003Cname>_object.ckpt` — a serialized Python object for convenient loading; this is what `eval.py` and the `stable_worldmodel` API use\n- `\u003Cname>_weight.ckpt` — a weights-only checkpoint (`state_dict`) for cases where you want to load weights into your own model instance\n\nPlace the extracted files under `$STABLEWM_HOME\u002F` and load via:\n\n```python\nimport stable_worldmodel as swm\n\n# Load the cost model (for MPC)\ncost = swm.policy.AutoCostModel('pusht\u002Flewm')\n```\n\n`AutoCostModel` accepts:\n- `run_name` — checkpoint path **relative to `$STABLEWM_HOME`**, without the `_object.ckpt` suffix\n- `cache_dir` — optional override for the checkpoint root (defaults to `$STABLEWM_HOME`)\n\nThe returned module is in `eval` mode with its PyTorch weights accessible via `.state_dict()`.\n\n### From the Hugging Face mirror\n\nThe HF model repos ship the LeWM checkpoint as a `weights.pt` (state dict) plus a\n`config.json` describing the model. Convert once to produce the `_object.ckpt`\nthat `eval.py` expects:\n\n```bash\n# download weights.pt + config.json\nhf download quentinll\u002Flewm-pusht --local-dir $STABLEWM_HOME\u002Fhf_pusht\n\n# convert to object checkpoint under $STABLEWM_HOME\u002Fpusht\u002Flewm_object.ckpt\npython - \u003C\u003C'PY'\nimport json, torch, stable_pretraining as spt\nfrom pathlib import Path\nfrom jepa import JEPA\nfrom module import ARPredictor, Embedder, MLP\nimport stable_worldmodel as swm\n\nsrc = Path(swm.data.utils.get_cache_dir(), \"hf_pusht\")\nout = Path(swm.data.utils.get_cache_dir(), \"pusht\", \"lewm_object.ckpt\")\n\ncfg = json.loads((src \u002F \"config.json\").read_text())\nencoder = spt.backbone.utils.vit_hf(\n    cfg[\"encoder\"][\"size\"],\n    patch_size=cfg[\"encoder\"][\"patch_size\"],\n    image_size=cfg[\"encoder\"][\"image_size\"],\n    pretrained=False, use_mask_token=False,\n)\nmlp = lambda k: MLP(input_dim=cfg[k][\"input_dim\"], output_dim=cfg[k][\"output_dim\"],\n                    hidden_dim=cfg[k][\"hidden_dim\"], norm_fn=torch.nn.BatchNorm1d)\nmodel = JEPA(\n    encoder=encoder,\n    predictor=ARPredictor(**cfg[\"predictor\"]),\n    action_encoder=Embedder(**cfg[\"action_encoder\"]),\n    projector=mlp(\"projector\"),\n    pred_proj=mlp(\"pred_proj\"),\n)\nsd = torch.load(src \u002F \"weights.pt\", map_location=\"cpu\", weights_only=False)\nmodel.load_state_dict(sd, strict=True)\nout.parent.mkdir(parents=True, exist_ok=True)\ntorch.save(model, out)\nPY\n```\n\nAfter conversion, load via `swm.policy.AutoCostModel('pusht\u002Flewm')` as usual.\n\n## Contact & Contributions\nFeel free to open [issues](https:\u002F\u002Fgithub.com\u002Flucas-maes\u002Fle-wm\u002Fissues)! For questions or collaborations, please contact `lucas.maes@mila.quebec`\n","LeWorldModel是一个基于像素的稳定端到端联合嵌入预测架构，用于从原始图像数据中学习紧凑的潜在空间世界模型。该项目通过仅使用两个损失项（下一个嵌入预测损失和一个正则化器）来确保高斯分布的潜在嵌入，从而简化了训练过程，减少了可调参数数量，并提高了模型稳定性。该模型具有约1500万参数，在单个GPU上几个小时内即可完成训练，适用于2D和3D控制任务等场景，同时在物理量编码方面表现出色。此外，LeWorldModel能够检测到物理上不合理的事件，显示出其对异常情况的敏感性。此项目适合需要高效、稳定且易于部署的世界模型的应用场景，如机器人导航、游戏AI等。",2,"2026-06-11 03:48:47","high_star"]