[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83418":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":10,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":30,"discoverSource":31},83418,"Chengling-PWM","ZhiChengAIR\u002FChengling-PWM","ZhiChengAIR","The first robot-native JEPA physical-world model.","",null,"Python",120,3,2,0,26,65,97,1.81,"Other",false,"main",[24,25,26,27],"jepa","physicalai","worldmodel","zhichengai","2026-06-12 02:04:34","# Chengling Physical World Model 0.1\n\n\u003Cp align=\"center\">\n    \u003Cpicture>\n        \u003Cimg src=\"assets\u002FImage_2026年5月14日_14_22_21-removebg-preview.png\" alt=\"Chengling\" width=\"1000\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-blue.svg?style=for-the-badge\" alt=\"MIT License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n- [Introduce Chengling Physical World Model 0.1](#introduce-chengling-physical-world-model-01)\n  - [Energy-Based Transformer (EBT): A Core of Chengling PWM](#energy-based-transformer-ebt-a-core-of-chengling-pwm)\n- [PWM Energy-Based Transformer](#pwm-energy-based-transformer)\n- [Getting Started](#getting-started)\n  - [Prerequisites](#prerequisites)\n  - [Clone the Repository](#clone-the-repository)\n  - [Setup Conda Environment](#setup-conda-environment)\n- [Data Preparation](#data-preparation)\n  - [Robomimic HDF5 to Zarr](#robomimic-hdf5-to-zarr)\n- [Usage](#usage)\n  - [Training](#training)\n  - [Inference Server](#inference-server)\n  - [Robosuite Client](#robosuite-client)\n- [More to Come](#more-to-come)\n  - [Roadmap for Chengling PWM](#roadmap-for-chengling-pwm)\n  - [ZhiCheng AI TR5 Pro](#zhicheng-ai-tr5-pro)\n\n\n## Introduce Chengling Physical World Model 0.1\n\n**Chengling** is an innovative Physical World Model (PWM) inspired by Meta's JEPA (Joint Embedding Predictive Architecture) and it's the first worldwide physical world model designed specifically for robots. It learns a latent representation of how the physical world behaves — how objects move, interact, and respond — from raw sensory inputs such as vision and proprioception. Unlike abstract world models that operate on generative spaces, the PWM directly models the dynamics of the physical environment. By observing robot demonstrations, Chengling learns to predict the consequences of actions, enabling a robot to mentally simulate outcomes before executing them. \n\n\u003Cp align=\"center\">\n    \u003Cpicture>\n        \u003Cimg src=\"assets\u002F01b20f18-4fae-4734-acd7-a08bc59f2d24.png\" alt=\"Chengling\" width=\"500\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n\nToday, we're thrilled to roll out Chengling Physical World Model 0.1, our first End-to-End JEPA-based Physical World Model. This initial release focuses on the core architecture and training pipeline, demonstrating the feasibility of energy-based modeling for physical world dynamics. We train Chengling on the robomimic dataset, which contains multi-camera RGB observations and proprioceptive states from various robotic tasks. The model learns to predict future action trajectories given a sequence of past observations and a language prompt describing the task.\n\n### Energy-Based Transformer (EBT): A Core of Chengling PWM\n\nHere we introduce a core component of Chengling PWM 0.1 — **Energy-Based Transformer (EBT)**.\n\nThe physical world model must capture not just \"what happens next\" but also \"how likely is this outcome under the laws of physics.\" Chengling formalizes this through an **energy function** over action trajectories. Given multi-step, multi-camera observations and a language prompt describing the task, the EBT evaluates candidate action trajectories against the learned energy landscape: trajectories with lower energy are more physically consistent with the current observation sequence and the intended task goal. Rather than generating actions through a single forward pass, Chengling performs Langevin Dynamics MCMC at inference — an iterative sampling process where each step applies gradient descent on the energy surface to progressively refine actions toward lower-energy, higher-quality trajectories. This iterative refinement is the robot's form of mental simulation: it imagines, evaluates, and improves actions before executing them in the real world.\n\nThe EBT architecture consists of three main components. A **frozenvision encoder** processes multi-view RGB observations into semantic tokens, providing rich perceptual features without introducing trainable parameters. A **Transformer decoder** fuses observation tokens, action history tokens, and language prompt tokens through self-attention and cross-attention, producing energy-guided action predictions. Finally, a lightweight **StateMapper** projects different robot embodiments with varying action dimensions into a unified 128-dimensional latent action space, enabling a single model to control diverse robots.\n\n## PWM Energy-Based Transformer\n\nIn our PWM 0.1, Chengling's EBT offers several key advantages over other diffusion-based and flow-matching approaches:\n\n- **Energy as a Direct Measure of World Consistency**: Unlike diffusion models that model the marginal distribution of actions, Chengling's energy function directly measures how physically consistent a trajectory is with the current observation and task. The energy surface provides a natural supervision signal — the model is pushed to assign lower energy to correct trajectories and higher energy to physically implausible ones.\n\n- **Iterative Refinement as Mental Simulation**: Diffusion models generate actions through a stochastic denoising process that is inherently probabilistic. Chengling's Langevin Dynamics MCMC performs iterative gradient-based refinement, which more closely mirrors how a world model should operate: exploring and improving trajectories through repeated \"what-if\" reasoning rather than sampling from a fixed distribution.\n\n- **Flexible Inference Compute**: The number of MCMC refinement steps is a simple integer that can be adjusted at inference time without retraining. Want faster, approximate inference? Use fewer steps. Need higher quality? Use more steps. This trade-off is seamless and does not require any model changes or architectural modifications.\n\n- **Training Robustness via Random Depth**: During training, the number of MCMC steps is randomly sampled (e.g., 3-4 steps). This forces the model to produce useful energy gradients at every refinement step, rather than overfitting to a fixed schedule. The resulting model is robust to inference-time step variation — a property that diffusion models lack, as perturbing their noise schedule often leads to degraded performance.\n\n- **Simpler Training Objective**: The training loss is a straightforward MSE between predicted and ground-truth action tokens, weighted by the energy-based reconstruction across MCMC steps. There is no noise prediction head, no variance scheduling, and no classifier-free guidance tuning — only the energy function and its gradient.\n\n- **Unified Multi-Embodiment Architecture**: Through the StateMapper, different robot embodiments with varying action dimensions are projected into a shared 128-dimensional latent action space. This enables a single physical world model to reason about diverse robots, each with different kinematic structures and action spaces.\n\n## Getting Started\n\n### Prerequisites\n- Linux or macOS (Ubuntu 20.04+ recommended)\n- Anaconda >= 2020.07\n- Docker >= 20.10 (if using containerization)\n- NVIDIA GPU with CUDA >= 11.1 (optional, but recommended for GPU acceleration)\n\n### Clone the Repository\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FZhiChengAIR\u002FChengling.git\ncd Chengling\n```\n\n### Setup Conda Environment\n\n```bash\nconda create -n chengling python=3.11\nconda activate chengling\npip install -r requirements.txt\n```\n\n## Data Preparation\n\n### Robomimic HDF5 to Zarr\n\nChengling uses Zarr format for training data. Use `scripts\u002Fprepare_robomimic_hdf5.py` to convert robomimic HDF5 datasets into the required Zarr format.\n\n1. **Create prompt files** — Each task needs a text file containing the language prompt:\n\n```bash\nmkdir -p prompts\necho \"lift the cube\" > prompts\u002Flift.txt\n```\n\n2. **Convert the dataset**:\n\n```bash\npython scripts\u002Fprepare_robomimic_hdf5.py \\\n    --hdf5_filepath \u003Cpath\u002Fto\u002Frobomimic\u002Fhdf5\u002Ffile.hdf5> \\\n    --output_dir_path .\u002Fdata \\\n    --task_name lift \\\n    --prompt_file prompts\u002Flift.txt\n```\n\n**Parameters:**\n\n| Argument | Shorthand | Description |\n|---|---|---|\n| `--hdf5_filepath` | `-i` | Path to the robomimic HDF5 file |\n| `--output_dir_path` | `-o` | Output directory for the Zarr dataset |\n| `--task_name` | `-n` | Task name (must match a key in `config\u002Ftask\u002Frobomimic.yaml`) |\n| `--prompt_file` | `-p` | Path to the text file containing the language prompt |\n\n**Full example for the Lift task:**\n\n```bash\ncd Chengling\n\n# Create prompts directory and prompt file\nmkdir -p prompts\necho \"lift the cube\" > prompts\u002Flift.txt\n\n# Convert lift task\npython scripts\u002Fprepare_robomimic_hdf5.py \\\n    --hdf5_filepath \u002Fpath\u002Fto\u002Frobomimic\u002Fdatasets\u002Flift\u002Fph\u002Fimage_v141.hdf5 \\\n    --output_dir_path .\u002Fdata \\\n    --task_name lift \\\n    --prompt_file prompts\u002Flift.txt\n```\n\nAfter conversion, the Zarr dataset will be saved to `.\u002Fdata\u002Flift\u002F`. You can then point `dataset.dataset_path` to the `.\u002Fdata` directory for training.\n\n## Usage\n\n### Training\n\n1. **Single-GPU Training**\n```bash\nHYDRA_FULL_ERROR=1 python train.py \\\n  --config-name=train_energy_config \\\n  dataset.dataset_path=\u003Cpath\u002Fto\u002Fdataset> \n```\n\n2. **Multi-GPU Training (Distributed Data Parallel)**\n```bash\nHYDRA_FULL_ERROR=1 torchrun train.py\\\n  --nproc-per-node=\u003Cnumber_of_GPUs> \\\n  --config-name=train_energy_config \\\n  dataset.dataset_path=\u003Cpath\u002Fto\u002Fdataset> \n```\n\n- `HYDRA_FULL_ERROR=1`: Enables full Hydra error messages for easier debugging.\n- `dataset.dataset_path`: The root directory containing your training data (e.g., a folder of recorded trajectories).\n- `hydra.run.dir`: Where checkpoints, logs, and wandb summaries will be saved.\n\n### Inference Server\n\nOnce a model checkpoint is saved (e.g., `run_dir\u002Fcheckpoints\u002Flast.ckpt`), start the inference server:\n\n1. **Direct Python script** (in host environment):\n```bash\npython scripts\u002Finference.py \\\n  --ckpt_path=\u003Cpath\u002Fto\u002Fcheckpoint.ckpt>\n```\n\n2. **Using Docker** (from host):\n```bash\n.\u002Fscripts\u002Frun_docker.sh \\\n  -c \"$PWD\" \\\n  -g all \\\n  -n chengling_infer \\\n  -i chengling \\\n  -m inference\n```\n\nThe server listens for observation data over TCP and returns predicted robot actions.\n\n### Robosuite Client\n\nAfter the inference server is running, you can use the robosuite client to connect a simulated robot environment and perform closed-loop rollouts:\n\n```bash\n# Make sure the inference server is already running on localhost:65432\npython robosuite_inference.py\n```\n\nThe client will:\n1. Create a robosuite `Lift` task environment with a Panda robot\n2. Connect to the inference server via TCP\n3. Convert robosuite observations into the format expected by the server (multi-step RGB history + low-dim state)\n4. Send observation data and receive action predictions in a loop\n5. Execute the predicted actions in the simulation and render the result\n\nYou can customize the client by modifying the following variables in `robosuite_inference.py`:\n\n| Variable | Default | Description |\n|---|---|---|\n| `SERVER_HOST` | `localhost` | Inference server hostname |\n| `SERVER_PORT` | `65432` | Inference server port |\n| `TASK_NAME` | `Lift` | Robosuite task name |\n| `PROMPT` | `\"lift the cube\"` | Language prompt for the task |\n| `EMB_NAME` | `\"lift\"` | Embodiment name (must match config) |\n\n## More to Come\n\n### Roadmap for Chengling PWM\n\nChengling PWM 0.1 has established fundamental physical world model framework with core EBT architecture and efficient training pipeline. Future versions will be trained on significantly larger and more diverse datasets, enabling the model to master a broader range of physical rules — object interactions, contact dynamics, multi-stage task reasoning, and fine-grained manipulation. The model will also gain stronger out-of-distribution generalization: reasoning correctly under novel object configurations, unseen environments, and complex natural language instructions that were not present during training.\n\n### ZhiCheng AI TR5 Pro\n\nTR5 Pro, ZhiCheng AI flagship robot, is an advanced general humanoid robot designed for versatile real-world applications.\nIt features multiple core strengths:\n* Chengling Brain: As TR5's perception-motor core, Chengling PWM enables onboard inference and real-world dynamic learning for real scene task execution.\n* Natural Interaction: Powered by a multimodal large model, enabling fluid, human-like communication.\n* Agile Mobility: With 52 degrees of freedom, it moves with precision and adapts to complex terrain.\n* Dexterous Manipulation: Its high-precision hands replicate fine human movements, making it a capable assistant.\n* Broad Applications: Suitable for home service, education, and commercial display scenarios.\n\nBuilt for performance, TR5 Pro stands 178cm tall, weighs 68kg, and offers 8 hours of continuous operation. Its dynamic motion control supports a single-leg load capacity of up to 800kg, ensuring stability and reliability in various tasks.\n\n\u003Cp align=\"center\">\n    \u003Cpicture>\n        \u003Cimg src=\"assets\u002F20260519175955_33_371.jpg\" alt=\"Chengling\" width=\"250\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n","2026-06-11 04:11:07","CREATED_QUERY"]