[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11157":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},11157,"RLDX-1","RLWRLD\u002FRLDX-1","RLWRLD",null,"Python",273,15,5,1,0,25,114,11,63.11,"Apache License 2.0",false,"main",true,[],"2026-06-13 04:00:50","\u003Cdiv align=\"center\">\n\n# RLDX-1\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.03269) [[Project Page]](https:\u002F\u002Frlwrld.ai\u002Frldx-1) [[Models]](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FRLWRLD\u002Frldx-1)\n\n\u003Cimg src=\"assets\u002Frldx_overview.png\" width=\"100%\" alt=\"RLDX-1 overview\">\n\n\u003C\u002Fdiv>\n\n---\n\nRLDX-1 is a Vision-Language-Action model (VLA) for human-like\ndexterous manipulation. Beyond the *versatile intelligence* inherited\nfrom pre-trained VLM backbones, RLDX-1 adds three **functional\ncapabilities** — motion awareness, long-term memory, and physical\nsensing — through a unified **Multi-Stream Action Transformer (MSAT)**\narchitecture, a synthetic-augmented training pipeline, and a real-time\ninference stack.\n\n---\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"assets\u002Foverview_architecture.png\" width=\"90%\" alt=\"RLDX-1 architecture\">\n\u003C\u002Fdiv>\n\n## Highlights\n\n- **Multi-Stream Action Transformer (MSAT).** Cognition, physics, and\n  action each get a dedicated stream coupled by joint self-attention —\n  an extension of MM-DiT to action modeling.\n- **Motion awareness.** Multi-frame observations + a motion module\n  capture temporal dynamics; intermediate VLM layers compress video\n  tokens to keep the policy efficient.\n- **Long-term memory.** A memory module fuses past cognition features\n  with the current ones for history-grounded decisions beyond a short\n  multi-frame window.\n- **Physical sensing.** Tactile and torque enter as a dedicated physics\n  stream; the decoder is jointly trained to predict future physical\n  signals.\n- **Three-stage training.** Pre-training (generalization) → mid-training\n  (functionality) → post-training (task adaptation), with synthetic data\n  augmenting rare manipulation scenarios.\n- **Real-time inference.** Static graph capture + custom fused kernels\n  bring the all-modality model to **43.7 ms \u002F step on RTX 5090\n  (1.63× speedup, >22 Hz)**.\n\n---\n\n## Performance\n\n### Simulation Benchmarks\n\nSuccess rates (%) of RLDX-1 fine-tuned on each benchmark's training set,\ncompared to recent frontier VLA baselines.\n\n| Method | LIBERO (Avg) | LIBERO-Plus | SIMPLER Google-VM | SIMPLER Google-VA | SIMPLER WidowX | RoboCasa Kitchen | GR-1 Tabletop | RoboCasa365 (Avg) |\n|---|---|---|---|---|---|---|---|---|\n| π0-FAST | 85.5 | 64.2 | 61.9 | 59.0 | 48.3 | 63.6 | — | 21.7 |\n| π0      | 94.1 | 54.6 | 58.8 | 54.8 | 27.1 | 62.5 | 13.6 | 14.8 |\n| π0.5    | 96.9 | 86.5 | 72.7 | 68.4 | 46.9 | 62.1 | 15.4 | 16.9 |\n| GR00T N1.5 | 86.5 | 66.3 | 52.4 | 43.7 | 62.0 | 65.7 | 48.0 | 20.0 |\n| GR00T N1.6 | 96.7 | 72.6 | 76.1 | 57.1 | 57.1 | 66.2 | 47.6 | 26.9 |\n| **RLDX-1 (ours)** | **97.8** | **86.7** | **81.5** | **77.4** | **71.9** | **70.6** | **58.7** | **32.1** |\n\nThe first five columns cover the established LIBERO \u002F SIMPLER family;\nthe last three (RoboCasa Kitchen, GR-1 Tabletop, RoboCasa365) are\nlong-horizon, humanoid, and compositional benchmarks. Per-benchmark\ncheckpoints, embodiment tags, and reproduce commands are listed under\n[Reproducing Benchmark Results](#reproducing-benchmark-results).\n\n---\n\n## Installation\n\n**Requirements**: Python 3.10, CUDA 12.x, [uv](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv) v0.8.4+\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FRLWRLD\u002FRLDX-1.git\ncd RLDX-1\nuv sync --python 3.10\nuv pip install -e .\n```\n\nVerify installation:\n```bash\nuv run python -c \"import rldx; print(rldx.__version__)\"\n```\n\nFor simulator setup, dev tooling, and full troubleshooting, see\n[`docs\u002Finstallation.md`](docs\u002Finstallation.md).\n\n---\n\n## Documentation\n\nHands-on guides live under [`docs\u002F`](docs\u002F):\n\n| Guide | What it covers |\n|---|---|\n| [`installation.md`](docs\u002Finstallation.md) | Environment setup, simulator venvs, dev tooling, common pitfalls |\n| [`architecture.md`](docs\u002Farchitecture.md) | Five-stage walkthrough of the RLDX-1 model and its config flags |\n| [`training.md`](docs\u002Ftraining.md) | `launch_train.py` recipes (fine-tune \u002F mid-train), LoRA, training-time RTC, dataset layout |\n| [`embodiment_tags.md`](docs\u002Fembodiment_tags.md) | What `EmbodimentTag` is and how to pick one for a custom robot |\n| [`evaluation.md`](docs\u002Fevaluation.md) | RoboCasa \u002F LIBERO \u002F SIMPLER \u002F GR-1 eval, server + rollout split, results aggregation |\n| [`inference_server.md`](docs\u002Finference_server.md) | `run_rldx_server.py` CLI, wire protocol, RTC modes, `--compile` levels, simulator + real-robot deployment |\n\n\n---\n\n## Pretrained & Midtrained Checkpoints\n\n| Checkpoint | Description | Params | HuggingFace |\n|-----------|-------------|--------|-------------|\n| `RLDX-1-PT` | Pre-trained (video) | 6.9B | [RLWRLD\u002FRLDX-1-PT](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-PT) |\n| `RLDX-1-MT-DROID` | Mid-trained on DROID with all add-ons | 8.1B | [RLWRLD\u002FRLDX-1-MT-DROID](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-MT-DROID) |\n| `RLDX-1-MT-ALLEX` | Mid-trained on ALLEX with all add-ons | 8.1B | [RLWRLD\u002FRLDX-1-MT-ALLEX](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-MT-ALLEX) |\n\n---\n\n## Data Preparation\n\nRLDX-1 uses [LeRobot](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot) v2.1 format datasets. To convert your data:\n\n```bash\n# Convert a single dataset\nbash run_scripts\u002Fdata\u002Fconvert_lerobot_single.sh \u002Fpath\u002Fto\u002Fyour\u002Fdata\n\n# Convert multiple datasets\nbash run_scripts\u002Fdata\u002Fconvert_lerobot_multiple.sh \u002Fpath\u002Fto\u002Fdata\u002Froot\n```\n\nEach dataset must carry a `meta\u002Fmodality.json` that slices the flat\nstate \u002F action vectors into named joint groups and remaps video columns\nto modality keys. Schema and a worked example are in\n[`docs\u002Ftraining.md`](docs\u002Ftraining.md#dataset-layout-metamodalityjson).\n\n### Custom Embodiment Config\n\nDefine your robot's modality configuration:\n\n```python\n# my_modality_config.py\nfrom rldx.data.types import ModalityConfig\n\nMODALITY_CONFIGS = {\n    \"my_robot\": {\n        \"image\": ModalityConfig(...),\n        \"state\": ModalityConfig(...),\n        \"action\": ModalityConfig(...),\n    }\n}\n```\n\nPass it via `--modality-config-path my_modality_config.py` during training,\ntogether with an `EmbodimentTag` that selects the per-robot MLP head slot\n(default: `GENERAL_EMBODIMENT`; see\n[`docs\u002Fembodiment_tags.md`](docs\u002Fembodiment_tags.md) for the picker).\n\nThe `EmbodimentTag` design and per-embodiment MLP head structure follow\nthe convention introduced by [NVIDIA GR00T N1.7](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FIsaac-GR00T\u002Ftree\u002Fn1.7-release).\n\n---\n\n## Fine-tuning\n\nThis section covers how to fine-tune RLDX-1 from a pre-trained checkpoint\n(`RLWRLD\u002FRLDX-1-PT`) on your own LeRobot v2.1 dataset. The training entry\npoint is a single CLI (`rldx\u002Fexperiment\u002Flaunch_train.py`) where flags\ntoggle the optional functional capabilities described in\n[Highlights](#highlights):\n\n- `--video-length N` — temporal frames per observation (motion awareness)\n- `--use-memory` — temporal memory module (long-term memory)\n- `--use-motion` — motion module inside the VLM backbone\n- `--use-physics --physics-keys ...` — tactile \u002F torque streams (physical sensing)\n\nLoRA, training-time RTC, and the full flag list are documented in\n[`docs\u002Ftraining.md`](docs\u002Ftraining.md). Below are the canonical recipes.\n\n### Single dataset, no add-ons\n\n```bash\nuv run python rldx\u002Fexperiment\u002Flaunch_train.py \\\n    --base-model-path RLWRLD\u002FRLDX-1-PT \\\n    --dataset-path \u002Fpath\u002Fto\u002Fyour\u002Fdataset \\\n    --embodiment-tag GENERAL_EMBODIMENT \\\n    --video-length 4 \\\n    --n-cog-tokens 64 \\\n    --global-batch-size 64 \\\n    --learning-rate 1e-4 \\\n    --max-steps 60000 \\\n    --save-steps 5000 \\\n    --output-dir .\u002Foutputs\u002Fmy_finetune\n```\n\n### With all add-ons (memory + motion + physics)\n\nRecommended for embodiments where memory, motion awareness, or contact\nsensing matter. To enable a *single* add-on instead of all three, keep\njust the corresponding `--use-*` flag(s) and drop the rest.\n\n```bash\nuv run python rldx\u002Fexperiment\u002Flaunch_train.py \\\n    --base-model-path RLWRLD\u002FRLDX-1-PT \\\n    --dataset-path \u002Fpath\u002Fto\u002Fyour\u002Fdataset \\\n    --embodiment-tag GENERAL_EMBODIMENT \\\n    --video-length 4 \\\n    --use-memory --memory-length 4 --concat-memory \\\n    --use-motion --motion-insert-layer 9 \\\n    --use-physics --physics-keys tactile torque --physics-dims 30 7 \\\n    --new-param-warmup-steps 2000 \\\n    --n-cog-tokens 64 \\\n    --global-batch-size 64 \\\n    --max-steps 60000 \\\n    --output-dir .\u002Foutputs\u002Fmy_finetune_all\n```\n\n### Key Training Flags\n\n| Flag | Description | Default |\n|------|-------------|---------|\n| `--video-length` | Number of video frames (video token compression is always on; set to `1` for single-frame) | `4` |\n| `--video-stride` | Stride between frames in action-step units | `2` |\n| `--use-memory` | Enable temporal memory module | `False` |\n| `--memory-length` | Memory context window (timesteps) | `4` |\n| `--use-motion` | Enable motion module | `False` |\n| `--use-physics` | Enable physics signal conditioning | `False` |\n| `--n-cog-tokens` | Number of cognition tokens | `64` |\n| `--global-batch-size` | Total batch size across GPUs | `64` |\n| `--new-param-warmup-steps` | Warmup steps for newly added modules | `0` |\n\n### LoRA fine-tuning\n\nFor memory-constrained fine-tunes you can replace full-parameter tuning\nof the action model (MSAT) and\u002For the backbone VLM with PEFT\nLoRA adapters:\n\n```bash\n--action-model-use-lora --action-model-lora-rank 16 --action-model-lora-alpha 32\n--backbone-use-lora --backbone-lora-rank 16 --backbone-lora-alpha 32 --backbone-lora-num-layers -1\n```\n\n`--action-model-use-lora` overrides `--tune-diffusion-model`;\n`--backbone-use-lora` overrides `--tune-top-llm-layers`. Full flag list\nand target-module defaults are in\n[`docs\u002Ftraining.md`](docs\u002Ftraining.md#lora-fine-tuning).\n\n### Training-time Real-Time Chunking\n\nIf you intend to serve the checkpoint with `--rtc-inference-mode trained`\n(faster, fullgraph-compatible), enable training-time RTC at training\ntime:\n\n```bash\n--rtc-training-max-delay 4\n```\n\nThe training-time RTC formulation follows\n[Black et al. (Training-Time Action Conditioning for Efficient Real-Time Chunking)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05964); the inference-side\ncounterpart is\n[Black et al. (Real-Time Execution of Action Chunking Flow Policies)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.07339). See\n[`docs\u002Ftraining.md`](docs\u002Ftraining.md#real-time-chunking-training-time)\nand [`docs\u002Finference_server.md`](docs\u002Finference_server.md#real-time-chunking-rtc)\nfor usage details.\n\n---\n\n## Inference\n\nRLDX-1 ships two inference paths sharing the same model + processor:\n\n- **In-process** — load `RLDXPolicy` and call `get_action(obs)` directly\n  from Python. Best for evaluation scripts and notebook prototyping.\n- **ZeroMQ server** — `rldx\u002Feval\u002Frun_rldx_server.py` for real-robot\n  deployment, with two orthogonal optimizations layered on top of the\n  base path:\n    - *Graph capture + kernel fusion* (`--compile {submodule, fullgraph}`)\n      — static-graph CUDA-graph capture and custom fused operators bring\n      the all-modality model to **43.7 ms \u002F step on RTX 5090** (1.63×\n      speedup over PyTorch eager, >22 Hz).\n    - *Real-Time Chunking* (`--rtc-inference-mode {guided, trained}`)\n      — chunk-boundary stitching for smooth action handoff between\n      consecutive chunks.\n\n### Quick Start\n\n```python\nimport torch\nfrom rldx.policy.rldx_policy import RLDXPolicy\nfrom rldx.data.embodiment_tags import EmbodimentTag\n\npolicy = RLDXPolicy(\n    model_path=\"RLWRLD\u002FRLDX-1-FT-ROBOCASA\",\n    embodiment_tag=EmbodimentTag.GENERAL_EMBODIMENT,\n    device=\"cuda:0\",\n)\n\n# Single-step inference\naction = policy.get_action(observation)\n```\n\n### Serving (ZeroMQ)\n\nFor real-time robot deployment:\n\n```bash\n# Start the policy server\nuv run python rldx\u002Feval\u002Frun_rldx_server.py \\\n    --model-path RLWRLD\u002FRLDX-1-FT-ROBOCASA \\\n    --embodiment-tag GENERAL_EMBODIMENT \\\n    --host 0.0.0.0 --port 20000\n```\n\n### Real-time inference (graph capture + RTC)\n\nThe server brings the all-modality model to **43.7 ms \u002F step on RTX\n5090 (1.63× speedup, >22 Hz)** through two orthogonal knobs:\n\n**`--compile {none, submodule, fullgraph}`** — graph capture + kernel fusion.\n\n- `submodule` — compiles each learnable sub-module. Preserves autograd. ~30 s warmup.\n- `fullgraph` — CUDA-graph capture and operator fusion over the full VLA forward. Lowest steady-state latency, ~90–210 s warmup.\n  - Tuned for RTX 5090 (Blackwell, sm_120). On other GPU architectures use `--compile submodule` for the intended result.\n\n**`--rtc-inference-mode {none, guided, trained}`** — Real-Time Chunking for chunk-boundary stitching.\n\n- `guided` — works with any flow-matching checkpoint.\n- `trained` — requires a checkpoint trained with `--rtc-training-max-delay > 0`. Pairs with `--compile fullgraph`.\n- Implementation follows [Black et al. (Real-Time Execution of Action Chunking Flow Policies)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.07339). The `trained` mode uses the integration from [Black et al. (Training-Time Action Conditioning for Efficient Real-Time Chunking)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05964).\n\nThe full flag list, the `compile × RTC` compatibility matrix, and a\nwalkthrough of the trade-offs are in\n[`docs\u002Finference_server.md`](docs\u002Finference_server.md#real-time-chunking-rtc).\n\n---\n\n## Reproducing Benchmark Results\n\nEach benchmark has a self-contained eval README; this table maps each\nresult row in [Performance](#simulation-benchmarks) to the fine-tuned\ncheckpoint we used, the embodiment tag the server expects, and the\nrunnable guide.\n\n| Benchmark | Fine-tuned Checkpoint | Embodiment Tag | Eval Guide |\n|---|---|---|---|\n| LIBERO | [RLWRLD\u002FRLDX-1-FT-LIBERO](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-FT-LIBERO) | `GENERAL_EMBODIMENT` | [`run_scripts\u002Feval\u002Flibero\u002FREADME.md`](run_scripts\u002Feval\u002Flibero\u002FREADME.md) |\n| LIBERO-Plus | [RLWRLD\u002FRLDX-1-FT-LIBERO](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-FT-LIBERO) | `GENERAL_EMBODIMENT` | [`run_scripts\u002Feval\u002Flibero_plus\u002FREADME.md`](run_scripts\u002Feval\u002Flibero_plus\u002FREADME.md) |\n| SimplerEnv Google | [RLWRLD\u002FRLDX-1-FT-SIMPLER-GOOGLE](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-FT-SIMPLER-GOOGLE) | `OXE_FRACTAL` | [`run_scripts\u002Feval\u002Fsimpler\u002FREADME.md`](run_scripts\u002Feval\u002Fsimpler\u002FREADME.md) |\n| SimplerEnv WidowX | [RLWRLD\u002FRLDX-1-FT-SIMPLER-WIDOWX](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-FT-SIMPLER-WIDOWX) | `OXE_BRIDGE_ORIG` | [`run_scripts\u002Feval\u002Fsimpler\u002FREADME.md`](run_scripts\u002Feval\u002Fsimpler\u002FREADME.md) |\n| GR-1 Tabletop | [RLWRLD\u002FRLDX-1-FT-GR1](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-FT-GR1) | `GENERAL_EMBODIMENT` | [`run_scripts\u002Feval\u002Fgr1_tabletop\u002FREADME.md`](run_scripts\u002Feval\u002Fgr1_tabletop\u002FREADME.md) |\n| RoboCasa Kitchen (24 tasks) | [RLWRLD\u002FRLDX-1-FT-ROBOCASA](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-FT-ROBOCASA) | `GENERAL_EMBODIMENT` | [`run_scripts\u002Feval\u002Frobocasa_kitchen\u002FREADME.md`](run_scripts\u002Feval\u002Frobocasa_kitchen\u002FREADME.md) |\n| RoboCasa365 | [RLWRLD\u002FRLDX-1-FT-RC365](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-FT-RC365) | `GENERAL_EMBODIMENT` | [`run_scripts\u002Feval\u002Frobocasa_365\u002FREADME.md`](run_scripts\u002Feval\u002Frobocasa_365\u002FREADME.md) |\n\nShared mechanics (server + rollout split, common flags, troubleshooting)\nare documented in [`docs\u002Fevaluation.md`](docs\u002Fevaluation.md).\n\n---\n\n## Project Structure\n\n```\nrldx\u002F\n├── configs\u002F                              # Model, data, and training configurations\n├── data\u002F                                 # Dataset loaders, processors, and statistics\n├── experiment\u002F                           # Training entry points and utilities\n├── eval\u002F                                 # Evaluation scripts and sim environments\n├── inference\u002F                            # Inference engine: GraphSafe substrate, fused Triton kernels, RTC dispatch\n├── model\u002F\n│   ├── core\u002F                             # Core model (RLDX-1, processor, setup)\n│   ├── modules\u002F\n│   │   ├── backbone\u002F                     # RLDX-1-VLM backbone (with video token compression)\n│   │   ├── action_model\u002F                 # MSAT diffusion action model + physics head\n│   │   ├── memory.py                     # Temporal memory transformer\n│   │   ├── norms.py                      # Shared normalization primitives\n│   │   └── embodiment_conditioned_mlp.py\n│   ├── pipeline.py                       # Training\u002Finference pipeline glue\n│   └── registry.py                       # Embodiment + variant registry\n├── policy\u002F                               # Inference policy wrappers\n└── utils\u002F                                # Distributed training utilities\n```\n\n---\n\n## Citation\n\n```bibtex\n@article{rldx2026,\n  title={RLDX-1 Technical Report},\n  author={Dongyoung Kim and Huiwon Jang and Myungkyu Koo and Suhyeok Jang and Taeyoung Kim and others},\n  year={2026},\n  journal={arXiv preprint arXiv:2605.03269},\n  eprint={2605.03269},\n  archivePrefix={arXiv}\n}\n```\n\n---\n\n## Acknowledgments\n\nRLDX-1 builds upon the following open-source projects:\n\n- [NVIDIA GR00T N1.7](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FIsaac-GR00T\u002Ftree\u002Fn1.7-release) — Training Codebase\n- [Qwen3-VL](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-VL) — Vision-language backbone\n- [FLUX](https:\u002F\u002Fgithub.com\u002Fblack-forest-labs\u002Fflux) — MMDiT architecture\n\n## License\n\n- **Code**: released under the [Apache License 2.0](LICENSE.md). The codebase\n  is built on the [NVIDIA Isaac GR00T N1.7](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FIsaac-GR00T\u002Ftree\u002Fn1.7-release)\n  framework — third-party attributions and per-file provenance headers are\n  preserved in the source tree.\n- **Model weights**: distributed on Hugging Face under the\n  [RLWRLD Model License v1.0](https:\u002F\u002Fhuggingface.co\u002FRLWRLD\u002FRLDX-1-PT\u002Fblob\u002Fmain\u002FLICENSE.md)\n  (a non-commercial license with attribution and share-alike terms). By using\n  any `RLWRLD\u002FRLDX-1-*` checkpoint you agree to those terms.\n\n## Contributions\n\nWe currently do not accept external pull requests on this repository.\nIf you encounter a bug, broken reproduction step, or have a question\nabout RLDX-1, please **open an issue** at\n[github.com\u002FRLWRLD\u002FRLDX-1\u002Fissues](https:\u002F\u002Fgithub.com\u002FRLWRLD\u002FRLDX-1\u002Fissues)\nand we will follow up there.\n","RLDX-1是一个用于类人灵巧操作的视觉-语言-动作模型。其核心功能和技术特点包括通过多流动作变换器（MSAT）架构、合成数据增强训练流程和实时推理堆栈，为模型添加了运动感知、长期记忆和物理感知三项能力。这些特性使得RLDX-1在处理复杂且需要长时间决策的任务时表现出色，特别是在仿真环境中执行精细操作任务时。该模型适用于机器人技术领域中需要高度精确控制与适应性的情景，如家庭服务机器人或工业自动化中的精密装配任务。",2,"2026-06-11 03:31:16","CREATED_QUERY"]