[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79906":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":11,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":13,"forks30d":13,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":13,"starSnapshotCount":13,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},79906,"Dexora","dexoravla\u002FDexora","dexoravla",null,"Python",114,1,90,0,5,16,22,17,0.9,"Other",false,"main",[],"2026-06-12 02:03:55","\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">Dexora: Open-Source VLA for High-DoF Bimanual Dexterity\u003C\u002Fh1>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.18722\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.18722-B31B1B.svg\" alt=\"arXiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fdexoravla.github.io\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-blue.svg\" alt=\"Project Page\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FDexora\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Dataset-Hugging%20Face-yellow.svg\" alt=\"Dataset\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green.svg\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ci>Dexora is a Vision–Language–Action (VLA) system for \u003Cb>dual-arm, dual-hand, 36-DoF dexterous manipulation\u003C\u002Fb>,\u003Cbr>\n  accepted at \u003Cb>ICRA 2026\u003C\u002Fb> (\u003Ca href=\"ICRA26_0209_FI.pdf\">paper PDF\u003C\u002Fa>).\n  This repository releases the full \u003Cb>training\u003C\u002Fb>, \u003Cb>inference\u003C\u002Fb>, \u003Cb>data-processing\u003C\u002Fb> and \u003Cb>teleoperation\u003C\u002Fb> code.\u003C\u002Fi>\n\u003C\u002Fp>\n\n---\n\n## 🔥 News & Updates\n\n- **2026-05** — Public source release: training pipeline, real-robot inference stack, BSON → LeRobot v2.1 converters, Vision-Pro teleoperation tools.\n- **2025-12-12** — Released the **task-level** view of the real-world dataset (one folder per high-level task) on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FDexora\u002FDexora_Real-World_Dataset).\n- **2025-12-03** — Released the full **Real-World Dataset** (**12.2K episodes \u002F 2.92M frames \u002F 40.5 h**) on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FDexora\u002FDexora_Real-World_Dataset).\n\n---\n\n## ✨ Highlights\n\n- **Hybrid teleoperation.** Gross arm kinematics from a custom exoskeleton backpack are combined with fine finger motion from markerless Apple Vision Pro tracking, driving both the physical platform and a MuJoCo digital twin.\n- **Embodiment-matched corpus.** 100 K simulated trajectories and 12.2 K real-world teleoperated episodes share the same 36-DoF dual-arm dual-hand embodiment and the LeRobot v2.1 schema.\n- **Quality-aware post-training.** A lightweight discriminator scores each demonstration clip, and the Diffusion Transformer policy is post-trained with a weighted denoising loss that down-weights low-quality demonstrations.\n- **Production-ready inference stack.** A 3-process ZMQ split (policy \u002F arms \u002F hands) cleanly isolates the conflicting Python environments required by the GPU policy, the AIRBOT SDK, and the XHAND SDK.\n\n---\n\n## 📦 Repository Layout\n\n```\nDexora-VLA\u002F\n├── configs\u002F                       # YAML \u002F JSON training configurations\n├── models\u002F                        # Diffusion-Transformer policy + discriminator\n├── train\u002F                         # Pretrain \u002F discriminator \u002F post-train entry points\n├── data\u002F                          # LeRobot v2.1 + legacy BSON \u002F HDF5 adapters\n├── scripts\u002F                       # Pre-screening, log-π proxy, smoothness \u002F open-loop eval\n├── dataprocess\u002F                   # BSON → LeRobot v2.1 conversion utilities\n├── teleop\u002F                        # Real-robot data collection + Vision-Pro teleoperation\n├── deploy\u002F                        # Real-robot inference (ZMQ split: policy \u002F arms \u002F hands)\n├── tests\u002F                         # CPU-only pytest suite\n├── google\u002F                        # SigLIP \u002F T5 download targets\n├── new_lerobot_stats\u002F             # Per-dim min\u002Fmax statistics\n├── s{1,2a,2b,2c,3}_*.sh           # Per-stage launchers\n├── run_all_stages.sh              # End-to-end pipeline driver\n├── pyproject.toml \u002F requirements*.txt\n├── ICRA26_0209_FI.pdf             # ICRA 2026 paper\n└── LICENSE \u002F CITATION.cff \u002F CONTRIBUTING.md \u002F CODE_OF_CONDUCT.md\n```\n\nA more detailed breakdown of each top-level package is available in the corresponding sub-`README.md` files (e.g. [`deploy\u002FREADME.md`](deploy\u002FREADME.md), [`teleop\u002FREADME.md`](teleop\u002FREADME.md), [`dataprocess\u002FREADME.md`](dataprocess\u002FREADME.md)).\n\n---\n\n## 🛠️ Installation\n\n```bash\n# 1. Conda env (Python 3.10 is required)\nconda create -n dexora python=3.10 -y\nconda activate dexora\n\n# 2. PyTorch — pick your own CUDA from pytorch.org (CUDA 12.1 example)\npip install torch==2.1.0 torchvision==0.16.0 \\\n    --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n\n# 3. Project dependencies (see requirements.txt for the pin list)\npip install -r requirements.txt\n\n# 4. Editable install — registers `configs` \u002F `data` \u002F `models` \u002F `train`\n#    as importable packages and adds the `dexora-train` console scripts.\npip install -e .\n\n# 5. (Optional) developer tooling\npip install -r requirements-dev.txt\npre-commit install\npytest tests\u002F -q                # 57 CPU-only tests, ~5 s\n\n# 6. (Optional) flash-attn — pure speed knob; the attention path falls back\n#    to PyTorch SDPA if this is absent.\n# pip install flash-attn --no-build-isolation\n```\n\n> **Why pinned versions?** We pin `transformers\u003C5`, `huggingface_hub\u003C0.26`, `diffusers\u003C0.32`, `accelerate\u003C1.0`, `lerobot\u003C0.4` and `numpy\u003C2.0`. Newer versions break the `is_offline_mode` \u002F LeRobot v2.1 \u002F `imgaug` interfaces that the training stack relies on.\n\n---\n\n## 📥 Data & Pretrained Encoders\n\n### Real-World Dataset\n\nThe Dexora real-world dataset is hosted on Hugging Face in the LeRobot v2.1 standard:\n\n```bash\nhuggingface-cli download Dexora\u002FDexora_Real-World_Dataset \\\n    --repo-type dataset \\\n    --local-dir data\u002FDexora_Real-World_Dataset\n```\n\nTotal ≈ 240 GB. The four task families are released as separate LeRobot v2.1 datasets so you can start with whichever subset is most relevant:\n\n```\ndata\u002FDexora_Real-World_Dataset\u002F\n├── airbot_pick_and_place\u002F\n├── airbot_assemble\u002F\n├── airbot_articulation\u002F\n└── airbot_dexterous\u002F\n    ├── data\u002F    chunk-000\u002Fepisode_000000.parquet ...\n    ├── videos\u002F  chunk-000\u002Fobservation.images.{top,wrist_left,wrist_right,front}\u002Fepisode_000000.mp4\n    └── meta\u002F    info.json  episodes.jsonl  tasks.jsonl  modality.json  stats.json\n```\n\n> **State \u002F action dimensions.** The HF release stores **39-D** state and action vectors. The last three dimensions (`head_joint_1`, `head_joint_2`, `spine_joint`) are fixed values required by the AIRBOT SDK but are *not* modelled by the Dexora policy. The training loaders slice to the first **36** dims by default: `[left_arm(6) | right_arm(6) | left_hand(12) | right_hand(12)]`. Pass `--state_dim_keep 0` to retain the full 39 dims.\n\n### Pretrained Encoders\n\n| Asset | Size | Default path |\n|---|---|---|\n| SigLIP-SO400M (vision) | ~3.7 GB | `google\u002Fsiglip-so400m-patch14-384\u002F` |\n| T5-v1.1-XXL (language) | ~44 GB  | `google\u002Ft5-v1_1-xxl\u002F` |\n\n```bash\nhuggingface-cli download google\u002Fsiglip-so400m-patch14-384 \\\n    --local-dir google\u002Fsiglip-so400m-patch14-384 --local-dir-use-symlinks False\nhuggingface-cli download google\u002Ft5-v1_1-xxl \\\n    --local-dir google\u002Ft5-v1_1-xxl              --local-dir-use-symlinks False\n```\n\nSee [`google\u002FREADME.md`](google\u002FREADME.md) for symlink shortcuts when these encoders already live elsewhere on disk.\n\n### Dataset Statistics (per-dim min–max)\n\n`dataset_statistics.json` is **not** included in the HF release because it depends on which subset you train on. The shell launchers below auto-generate it once if missing; alternatively, pre-compute it explicitly:\n\n```bash\npython -m data.lerobot_vla_dataset --stat \\\n    --num_samples 5000 \\\n    --repo_dir   data\u002FDexora_Real-World_Dataset\u002Fairbot_pick_and_place \\\n    --output_dir new_lerobot_stats\n```\n\nThis writes a 36-D `new_lerobot_stats\u002Fdataset_statistics.json` plus `state_distributions.png` \u002F `action_distributions.png` for a quick sanity check. See [`new_lerobot_stats\u002FREADME.md`](new_lerobot_stats\u002FREADME.md).\n\n---\n\n## 🚀 Training Pipeline\n\nThe training procedure has three stages: **(1)** pretrain the policy, **(2)** train a quality discriminator that scores demonstrations, **(3)** fine-tune the policy with the discriminator-derived per-sample weights. Each stage is launched by a single shell script that reads its inputs from environment variables (with sensible defaults).\n\n```bash\n# Shared inputs for all stages (override via env vars as needed).\nexport DEXORA_LEROBOT_ROOT=data\u002FDexora_Real-World_Dataset\u002Fairbot_pick_and_place\nexport DEXORA_T5=google\u002Ft5-v1_1-xxl\nexport DEXORA_SIGLIP=google\u002Fsiglip-so400m-patch14-384\nexport DEXORA_STATS=new_lerobot_stats\u002Fdataset_statistics.json\n```\n\n### Stage 1 — Policy pretraining\n\nTrains the 400 M Diffusion Transformer policy for 100 K steps on the real corpus. Swap `DEXORA_LEROBOT_ROOT` for the simulation corpus to reproduce the sim-pretrain variant.\n\n```bash\nNUM_GPUS=8 MAX_TRAIN_STEPS=100000 \\\nOUTPUT_DIR=checkpoints\u002Fdexora-400m-pretrain \\\n    bash s1_pretrain.sh\n```\n\nOutputs land under `checkpoints\u002Fdexora-400m-pretrain\u002Fcheckpoint-*\u002F{pytorch_model.bin,config.json,ema\u002F}`.\n\n### Stage 2 — Quality discriminator\n\nThe discriminator turns each demonstration clip into a scalar quality score. It is trained in three sub-steps:\n\n**2a · Pre-screening.** Compute per-episode normalized acceleration and jerk, then keep the intersection of the lowest 20 % on both metrics as a high-quality candidate set `S_pre` (≈ 18 % of episodes).\n\n```bash\nSPRE_DIR=runs\u002Fspre bash s2a_analyze_jerk.sh\n# → runs\u002Fspre\u002Fcomplete_analysis_results.json\n```\n\n**2b · Replay-based post-validation.** Open-loop replay each candidate episode in the MuJoCo digital twin and keep the survivors that complete the task without collisions, yielding `S_high`.\n\n```bash\nSPRE_DIR=runs\u002Fspre SHIGH_FILE=runs\u002Fshigh.json \\\nREPLAY_VERIFIER=trust_spre \\\n    bash s2b_replay.sh\n```\n\nThe bundled `--verifier trust_spre` is a stub that accepts every `S_pre` episode (smoke test). Switch to `--verifier energy` for a cheap kinematic heuristic, or to `--verifier mujoco --twin_module path.to.your.replay` to plug in the real MuJoCo replay. The plug-in module must expose `replay(states, actions, task_id) -> {\"success\": bool, \"collision_free\": bool}`.\n\n**2c · Log-π proxy + discriminator training.** A per-chunk action-energy proxy `logπ̂_t = -zscore(E_t)` is computed from the Stage-1 checkpoint, then a small PU-loss discriminator is trained to distinguish `S_high` from the rest.\n\n```bash\n# (i) log-π proxy\nSTAGE1_CKPT=checkpoints\u002Fdexora-400m-pretrain \\\nLOGPI_FILE=runs\u002Flogpi\u002Flogpi.json \\\n    bash s2c_compute_logpi.sh\n\n# (ii) discriminator\nOUTPUT_DIR=checkpoints\u002Fdexora-scoring \\\nLOGPI_FILE=runs\u002Flogpi\u002Flogpi.json \\\nSPRE_FILE=runs\u002Fspre\u002Fcomplete_analysis_results.json \\\nSHIGH_FILE=runs\u002Fshigh.json \\\n    bash s2c_train_scoring.sh\n```\n\nThe discriminator (`models\u002Fscoring_model.py`) ingests the scalar `logπ̂_t` through a small sinusoidal positional-style encoding (8 frequency bands + raw) before the linear projection. This is mathematically equivalent in capacity to `Linear(1 → hidden_size)` but more numerically robust under bf16 when the z-scored proxy sits near zero.\n\n### Stage 3 — Quality-aware post-training\n\nLoads the Stage-1 policy and the frozen Stage-2 discriminator, then fine-tunes the policy on the real corpus with a per-sample weighted denoising loss\n\n$$\\mathcal{L}_\\pi \\;=\\; \\sum_{i=1}^{L} w_i \\, \\lVert\\, \\varepsilon_\\theta(\\cdot) - \\varepsilon \\,\\rVert_2^2,$$\n\nwhere the per-sample weight `w_i` is produced online from the discriminator score via a DWBC-style mapping (with a short linear warm-up).\n\n```bash\nSTAGE1_CKPT=checkpoints\u002Fdexora-400m-pretrain \\\nSCORING_CKPT=checkpoints\u002Fdexora-scoring\u002Ffinal_model\u002Fpytorch_model.bin \\\nOUTPUT_DIR=checkpoints\u002Fdexora-400m-posttrain \\\n    bash s3_post_train.sh\n```\n\nTo reproduce the *no-discriminator* baseline, pass `EXTRA_FLAGS=\"--no_quality_weights\"`.\n\n### End-to-end pipeline\n\n```bash\nRUN_DIR=.\u002Fruns\u002Fdexora-paper-rep \\\nDEXORA_LEROBOT_ROOT=data\u002FDexora_Real-World_Dataset\u002Fairbot_pick_and_place \\\n    bash run_all_stages.sh\n\n# Chain a subset of stages with START_STAGE \u002F END_STAGE, e.g.\n# START_STAGE=4 END_STAGE=6 RUN_DIR=.\u002Fruns\u002F... bash run_all_stages.sh\n```\n\n---\n\n## 🤖 Real-Robot Deployment\n\n`deploy\u002F` runs a trained Dexora policy on the physical robot. The integration is split into three single-purpose processes that talk over loopback ZMQ, so the conflicting Python environments for the policy (GPU + `torch`), the arms SDK (`airbot_py`) and the hands SDK (`xhand_tele_ops`, Python 3.8) can coexist without dependency conflicts:\n\n```\n+-----------------------------+   ZMQ tcp:\u002F\u002F*:5556    +------------------------+\n| dexora_inference_zmq.py     | \u003C------------------>  | mmk_forwarder.py       |\n| (env: dexora, GPU)          |  arms, 12-D radians   | (env: imitall, 3.10)   |\n|                             |   ZMQ tcp:\u002F\u002F*:5557    +------------------------+\n|                             | \u003C------------------>  | xhand_forwarder.py     |\n|                             |  hands, 2×12-D rad    | (env: xhand_tele_env)  |\n+-----------------------------+                       +------------------------+\n```\n\n`deploy\u002Fdexora_policy.py` wraps `RDTRunner.from_pretrained(...)` plus SigLIP-SO400M and T5-XXL into a single `policy.get_action(obs) -> [L, 36]` call. The inference loop follows a chunk-and-replay scheme: every `chunk_size` (= L) control ticks we sample a length-L action sequence and play it back with `action_buffer[t % L]`.\n\n### Quick start (three terminals)\n\n```bash\n# Terminal A — XHand forwarder (env: xhand_tele_env, Python 3.8)\nconda activate xhand_tele_env\npython deploy\u002Fxhand_forwarder.py --config deploy\u002Fmmk_xhand_config.yaml\n\n# Terminal B — MMK forwarder (env: imitall, Python 3.10)\nconda activate imitall\npython deploy\u002Fmmk_forwarder.py   --config deploy\u002Fmmk_xhand_config.yaml\n\n# Terminal C — Dexora policy (env: dexora, GPU)\nconda activate dexora\npython deploy\u002Fdexora_inference_zmq.py \\\n    --model-path checkpoints\u002Fdexora-400m-posttrain \\\n    --config-path deploy\u002Fmmk_xhand_config.yaml \\\n    --task-description \"Pick the apple and put it on the plate.\" \\\n    --save-logs --monitor-interval 1\n```\n\n### One-shell mode\n\n```bash\nTASK_DESCRIPTION=\"Pick the apple and put it on the plate.\" \\\nMODEL_PATH=checkpoints\u002Fdexora-400m-posttrain \\\n    bash deploy\u002Finference.sh\n```\n\nWire protocol, joint limits, RealSense fallback, and the full troubleshooting checklist are documented in [`deploy\u002FREADME.md`](deploy\u002FREADME.md).\n\n> **Noise schedule and inference steps.** Training uses a 1000-step DDPM forward process with a cosine `squaredcos_cap_v2` beta schedule, predicting the action noise `ε̂_θ`. At inference we swap DDPM for **DPMSolver++** and run only `num_inference_timesteps = 5` solver steps. Increasing this to 10–20 marginally improves smoothness on dexterous tasks at a proportional latency cost.\n>\n> **Backward compatibility.** Earlier Dexora checkpoints were saved with `prediction_type=sample`. `RDTRunner.compute_loss` and `scripts\u002Fcompute_logpi.py` both still handle the `sample` branch even though new training defaults to `epsilon`.\n\n---\n\n## 📊 Open-Loop Evaluation\n\n`scripts\u002Feval_action_curves.py` reproduces per-joint trajectory plots from a single LeRobot v2.1 episode. It triggers one diffusion pass every `--inference-interval` steps, then overlays the predicted action chunks on the ground-truth trajectory for every one of the 36 controlled joints (plus a 6 × 6 summary grid).\n\nThis is the **open-loop** protocol — we always condition on the ground-truth observation at each sampled timestep, never on the policy's own previous prediction. It is the cheapest sanity check that a trained checkpoint is producing physically plausible chunks before committing to a closed-loop rollout on the real robot.\n\nUnder the hood the script reuses the same `deploy\u002Fdexora_policy.py` wrapper as the on-robot inference loop, so the prediction path is bit-identical to what the robot would receive at runtime. Inputs (state, action) are normalized with the same `dataset_statistics.json` the policy was trained on, ensuring an apples-to-apples comparison.\n\n```bash\nMODEL_PATH=checkpoints\u002Fdexora-400m-posttrain \\\nREPO_DIR=data\u002FDexora_Real-World_Dataset\u002Fairbot_pick_and_place \\\nSTATS_FILE=new_lerobot_stats\u002Fdataset_statistics.json \\\nEPISODE_IDX=0 INFERENCE_INTERVAL=32 \\\nOUTPUT_DIR=eval_results\u002Fairbot_pick_and_place_ep0 \\\n    bash scripts\u002Frun_eval_example.sh\n```\n\nOutputs 36 per-axis PNGs (`ep000000_axis_\u003Ci>_\u003Cjoint_name>.png`) plus one `ep000000_summary.png` grid under `${OUTPUT_DIR}`.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Useful knobs\u003C\u002Fb>\u003C\u002Fsummary>\n\n| Flag \u002F env var          | Meaning |\n|---|---|\n| `--inference-interval`  | Cadence between diffusion passes; defaults to `chunk_size = 32`, i.e. non-overlapping chunks. Use `16` to visualize chunk consistency on overlap. |\n| `--max-steps`           | Truncate to the first N steps of the episode (default: full episode). |\n| `--instruction \"...\"`   | Override the dataset-derived language goal (default: read from `tasks.jsonl`). |\n| `--state-dim 39`        | Keep the full 39-D AIRBOT state instead of slicing to the 36-D modelled layout. |\n| `--no-normalize`        | Disable per-dim normalization (legacy checkpoints trained without `stats_file`). |\n| `--dump-json`           | Also dump GT + predictions as a JSON consumable by `scripts\u002Feval_smoothness.py`. |\n\n\u003C\u002Fdetails>\n\n> **Heads up.** The script needs the policy, SigLIP-SO400M, T5-v1.1-XXL *and* the LeRobot dataset all visible at the same time, so peak GPU memory matches the deploy stack (~30 GB on an A100 in bf16). For sanity checks on smaller GPUs you can set `--text-encoder` to a local T5-base or `--vision-encoder` to a smaller SigLIP — at the cost of breaking apples-to-apples comparison with the released checkpoints.\n\n---\n\n## 🎮 Teleoperation & Data Collection\n\nThe on-robot recording stack lives in [`teleop\u002F`](teleop\u002FREADME.md). It is the same kit used to capture the released `Dexora_Real-World_Dataset`, with paths anchored at `PROJECT_ROOT` so it ports cleanly to a new robot.\n\n- `teleop\u002Fscripts\u002Frecord_delete.py` — top-level orchestrator that forks the robot recorder and the Vision-Pro teleop simultaneously, then archives each episode under a configurable root.\n- `teleop\u002Fimitate_all\u002Frecord_4_rgb_cam.py` — robot + 4-camera recorder (USB \u002F RealSense → BSON), adapted from [airbot Imitate-All](https:\u002F\u002Fgithub.com\u002Fairbots-org\u002FImitate-All).\n- `teleop\u002Fteleop_pkg\u002Freceive_from_vision_pro.py` — pulls the Apple Vision Pro hand skeleton, retargets to the 12-DoF XHAND joints, drives the hands and logs `xhand_control_data.bson`.\n- `teleop\u002Fscripts\u002Freplay.py` — synchronized playback of a recorded episode on both arms and hands.\n- `teleop\u002Fdata_tools\u002F`, `teleop\u002Fvideo_tools\u002F`, `teleop\u002Fcamera_tools\u002F` — episode consistency checks, 2 × 2 review-video generator, and USB-camera bring-up.\n\nTwo conda environments are required (the same ones used by `deploy\u002F`): `imitall` (Python 3.10, AIRBOT SDK) on the robot side and `xhand_tele_env` (Python 3.8, `xhand_tele_ops`) for the Vision-Pro hand side. The full setup — udev rules for the USB cameras, Vision-Pro IP configuration, secrets layout — is documented in [`teleop\u002FREADME.md`](teleop\u002FREADME.md). Once recorded, [`dataprocess\u002Fairbot_lerobot.py`](dataprocess\u002Fairbot_lerobot.py) converts the BSON session into the LeRobot v2.1 layout consumed by `data\u002Flerobot_vla_dataset.py` and `s1_pretrain.sh`.\n\n---\n\n\n## 🔗 Related Work & Upstream Tooling\n\n| Component | Used for | Link |\n|---|---|---|\n| LeRobot v2.1 | Real-world data format | \u003Chttps:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot> |\n| DexMimicGen | Synthetic trajectory synthesis | \u003Chttps:\u002F\u002Fgithub.com\u002FNVlabs\u002FDexMimicGen> |\n| Objaverse \u002F Objaverse-XL | Source of 3D assets for simulation | \u003Chttps:\u002F\u002Fobjaverse.allenai.org\u002F> |\n| Qwen2.5-VL | VLM-driven asset mining and physical-property assignment | \u003Chttps:\u002F\u002Fhuggingface.co\u002FQwen> |\n| MuJoCo | Digital twin and replay-based post-validation | \u003Chttps:\u002F\u002Fmujoco.org> |\n| RDT-1B | Architectural reference for the Diffusion-Transformer policy | \u003Chttps:\u002F\u002Fgithub.com\u002Fthu-ml\u002FRoboticsDiffusionTransformer> |\n| DWBC | Score → weight mapping for the post-training loss | \u003Chttps:\u002F\u002Fgithub.com\u002Fryanxhr\u002FDWBC> |\n\n---\n\n## 📜 Citation\n\nIf you find Dexora useful in your research, please cite our ICRA 2026 paper:\n\n```bibtex\n@article{zhang2026dexora,\n  title={Dexora: Open-source VLA for High-DoF Bimanual Dexterity},\n  author={Zhang, Zongzheng and Pang, Jingrui and Yang, Zhuo and Li, Kun and Liao, Minwen and Zhang, Saining and Chi, Guoxuan and Guo, Jinbang and Gao, Huan-ang and Shi, Modi and others},\n  journal={arXiv preprint arXiv:2605.18722},\n  year={2026}\n}\n```\n\n\n\n## 📝 License\n\nThis codebase is released under the [MIT License](LICENSE). Third-party components (SigLIP, T5, LeRobot, RDT-1B reference) retain their original licenses.\n\nFor questions, collaborations, or feedback, please open an issue or reach the maintainers through the [project page](https:\u002F\u002Fdexoravla.github.io).\n","Dexora是一个针对高自由度双臂灵巧操作的开源视觉-语言-动作（VLA）系统。该项目利用Python开发，集成了训练、推理、数据处理及远程操作等功能模块，支持通过自定义外骨骼背包与无标记Apple Vision Pro追踪技术实现混合远程操控。它包含了一个大规模的真实世界数据集（12.2K次实验\u002F2.92M帧），以及一个与之匹配的模拟轨迹库，共同遵循36自由度双臂双手架构。此外，Dexora还引入了基于轻量级判别器的质量感知后训练机制，并提供了一个生产就绪的推理栈，确保了不同组件间的环境隔离。适用于需要精确控制复杂机械臂完成精细任务的研究和工业场景。",2,"2026-06-11 03:58:29","CREATED_QUERY"]