[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80960":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":11,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":13,"stars30d":13,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":14,"rankGlobal":9,"rankLanguage":9,"license":15,"archived":16,"fork":16,"defaultBranch":17,"hasWiki":16,"hasPages":16,"topics":18,"createdAt":9,"pushedAt":9,"updatedAt":19,"readmeContent":20,"aiSummary":21,"trendingCount":13,"starSnapshotCount":13,"syncStatus":22,"lastSyncTime":23,"discoverSource":24},80960,"agentic-rtmo","chivector\u002Fagentic-rtmo","chivector","Agentic-RTMO pose estimation code.",null,"Python",31,1,0,0.9,"Apache License 2.0",false,"main",[],"2026-06-12 02:04:09","# Agentic-RTMO\n\n> A lightweight Think-Critique-Act extension for RTMO, designed for real-time multi-person pose estimation in crowded scenes.\n\nAgentic-RTMO is a practical extension of [OpenMMLab MMPose](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmpose) \u002F RTMO. The core idea is simple: a one-stage pose estimator should not always trust its first answer. Instead of re-running the backbone or adding a heavy global reasoning module, Agentic-RTMO performs a small iterative correction loop inside RTMO's Dynamic Coordinate Classifier (DCC).\n\nIn crowded scenes, wrists, ankles, elbows, and knees are often pulled toward nearby people or occluded regions. Agentic-RTMO addresses this failure mode with a lightweight **Think-Critique-Act** loop:\n\n- **Think**: decode the current keypoint latent features into temporary keypoint predictions.\n- **Critique**: use a Structural Critic to estimate which joints look unreliable under skeleton topology and confidence cues.\n- **Act**: use a Feature Refinement Actor to update keypoint latent features before the next coordinate classification step.\n\nThe result is not a large new framework, but a focused upgrade to RTMO: the model keeps the speed advantage of one-stage pose estimation while gaining a limited but useful self-correction ability.\n\n## Demo\n\n![Agentic-RTMO Think-Critique-Act demo](demo\u002Fresources\u002Fagentic_rtmo_tca_demo_ppt.gif)\n\n[Watch the full MP4 demo with voiceover](demo\u002Fresources\u002Fagentic_rtmo_tca_demo_ppt.mp4)\n\n## Highlights\n\n- **Self-correction without re-running the image backbone**  \n  The correction loop operates on DCC keypoint latent features, so it avoids repeatedly computing backbone \u002F neck features.\n\n- **Structure-aware feedback**  \n  The Structural Critic uses keypoint coordinates, joint confidence, and skeleton connectivity to produce per-joint error probabilities and displacement hints.\n\n- **Feature-level refinement instead of hard coordinate shifting**  \n  The Actor updates latent features, allowing the coordinate distribution to be re-estimated rather than manually moving final keypoints.\n\n- **Drop-in RTMO integration**  \n  The implementation is controlled by `agentic_cfg`. Setting `enabled=False` restores the original RTMO DCC path.\n\n- **Built for ablation**  \n  Iteration count, critic width, actor width, and residual scale are exposed as config options, making it easy to study the speed-accuracy trade-off.\n\n- **OpenMMLab-compatible workflow**  \n  Training, testing, and demos follow the standard MMPose toolchain, so existing RTMO users can adapt the repo with minimal friction.\n\n## Reported Results\n\nThe project is designed around the following experimental target: improve crowded-scene robustness while keeping real-time throughput. Under the reported setting, Agentic-RTMO improves over the RTMO baseline with a small latency cost.\n\n| Setting | Model | AP | AP75 | FPS \u002F Latency |\n|---|---:|---:|---:|---:|\n| COCO test-dev | RTMO-l(MS) | 73.3 | 80.8 | 19.1 ms |\n| COCO test-dev | Agentic-RTMO-l(MS) | 74.7 | 82.1 | 21.0 ms |\n| CrowdPose test | RTMO-l(MS) | 83.8 | - | 141 FPS |\n| CrowdPose test | Agentic-RTMO-l(MS) | 86.1 | - | 128 FPS |\n\nThese numbers should be read in the intended spirit: Agentic-RTMO is a lightweight reasoning add-on, not a brute-force scaling approach. Its main advantage appears in hard cases where local evidence is ambiguous and skeleton consistency matters.\n\n## Core Files\n\n| File | Purpose |\n|---|---|\n| `configs\u002Fbody_2d_keypoint\u002Frtmo\u002Fcoco\u002Fagentic-rtmo-m_16xb16-600e_coco-640x640.py` | Minimal Agentic-RTMO training config based on RTMO-m |\n| `mmpose\u002Fmodels\u002Fheads\u002Fhybrid_heads\u002Fagentic_modules.py` | Structural Critic, Feature Refinement Actor, and Think-Critique-Act Loop |\n| `mmpose\u002Fmodels\u002Fheads\u002Fhybrid_heads\u002Frtmo_head.py` | RTMO DCC integration point for the Agentic loop |\n| `METHOD_CODE_MAPPING.md` | Method-to-code mapping for quick review |\n| `QUICKSTART_5MIN.md` | Minimal smoke-test guide |\n| `SETUP_RUN_EXPERIMENT_GUIDE.md` | Setup, training, evaluation, and troubleshooting guide |\n\n## Method Overview\n\nAgentic-RTMO modifies the DCC stage of RTMO. Given pose features, RTMO first converts them into keypoint latent features and X\u002FY coordinate distributions. Agentic-RTMO inserts an optional iterative loop before the final decoding:\n\n1. Decode temporary keypoints and confidence scores from current `kpt_feats`.\n2. Feed coordinates and scores into `StructuralCritic`.\n3. Produce `error_prob` and `disp_hint` for each joint.\n4. Feed critic feedback and current `kpt_feats` into `FeatureRefinementActor`.\n5. Apply a residual feature update.\n6. Repeat for `num_iters` rounds, then decode final keypoints.\n\nThis keeps the expensive image feature extraction path unchanged. The correction is concentrated where RTMO already represents keypoints: the DCC latent space.\n\n## Installation\n\nCreate a clean Python environment first. The commands below are examples; adjust PyTorch installation according to your CUDA version.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fchivector\u002Fagentic-rtmo.git\ncd agentic-rtmo\n\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npython -m pip install -U pip setuptools wheel\n```\n\nInstall PyTorch:\n\n```bash\npip install torch torchvision torchaudio\n```\n\nInstall OpenMMLab dependencies and this project:\n\n```bash\npip install -U openmim\nmim install mmengine \"mmcv>=2.0.0\" \"mmdet>=3.0.0\"\npip install -r requirements.txt\npip install -v -e .\n```\n\nFor a quick code-level sanity check:\n\n```bash\npython -m compileall mmpose\u002Fmodels\u002Fheads\u002Fhybrid_heads\u002Fagentic_modules.py\npython -m compileall mmpose\u002Fmodels\u002Fheads\u002Fhybrid_heads\u002Frtmo_head.py\npython -m compileall configs\u002Fbody_2d_keypoint\u002Frtmo\u002Fcoco\u002Fagentic-rtmo-m_16xb16-600e_coco-640x640.py\n```\n\n## Dataset\n\nThe default config uses the COCO keypoint dataset. Organize it following the standard MMPose layout:\n\n```text\ndata\u002Fcoco\u002F\n  annotations\u002F\n    person_keypoints_train2017.json\n    person_keypoints_val2017.json\n  train2017\u002F\n  val2017\u002F\n```\n\nThis repository does not include COCO images, annotations, or model checkpoints.\n\n## Training\n\nSingle-GPU training:\n\n```bash\npython tools\u002Ftrain.py \\\n  configs\u002Fbody_2d_keypoint\u002Frtmo\u002Fcoco\u002Fagentic-rtmo-m_16xb16-600e_coco-640x640.py\n```\n\nMulti-GPU training:\n\n```bash\nbash tools\u002Fdist_train.sh \\\n  configs\u002Fbody_2d_keypoint\u002Frtmo\u002Fcoco\u002Fagentic-rtmo-m_16xb16-600e_coco-640x640.py \\\n  8 --amp\n```\n\nBaseline RTMO comparison:\n\n```bash\npython tools\u002Ftrain.py \\\n  configs\u002Fbody_2d_keypoint\u002Frtmo\u002Fcoco\u002Frtmo-m_16xb16-600e_coco-640x640.py\n```\n\n## Evaluation\n\n```bash\npython tools\u002Ftest.py \\\n  configs\u002Fbody_2d_keypoint\u002Frtmo\u002Fcoco\u002Fagentic-rtmo-m_16xb16-600e_coco-640x640.py \\\n  \u003CYOUR_CHECKPOINT>.pth\n```\n\nFor a clean comparison, evaluate RTMO and Agentic-RTMO on the same machine, dataset version, input resolution, batch size, and measurement protocol.\n\n## Inference Demo\n\n```bash\npython demo\u002Finferencer_demo.py \u003CIMAGE_PATH> \\\n  --pose2d configs\u002Fbody_2d_keypoint\u002Frtmo\u002Fcoco\u002Fagentic-rtmo-m_16xb16-600e_coco-640x640.py \\\n  --pose2d-weights \u003CYOUR_CHECKPOINT>.pth \\\n  --vis-out-dir vis_results\n```\n\n## Agentic Config\n\nThe Agentic loop is configured through `agentic_cfg` inside the DCC config:\n\n```python\nmodel = dict(\n    head=dict(\n        type='RTMOHead',\n        dcc_cfg=dict(\n            agentic_cfg=dict(\n                enabled=True,\n                num_iters=2,\n                critic_hidden_dim=64,\n                actor_hidden_dim=128,\n                residual_scale=0.5,\n            ))))\n```\n\nUseful ablations:\n\n- `enabled=False`: disable the loop and fall back to standard RTMO behavior.\n- `num_iters=0\u002F1\u002F2\u002F3`: measure how many correction rounds are worth the latency.\n- `critic_hidden_dim`: change the capacity of the structural critic.\n- `actor_hidden_dim`: change the capacity of the feature refinement actor.\n- `residual_scale`: control the magnitude of latent feature updates.\n\n## Experiment Template\n\n```text\nDate:\nMachine\u002FGPU:\nDataset:\n\n[Baseline RTMO]\nConfig:\nCheckpoint:\nAP:\nAP50:\nAP75:\nFPS\u002FLatency:\n\n[Agentic-RTMO]\nConfig:\nCheckpoint:\nAP:\nAP50:\nAP75:\nFPS\u002FLatency:\n\n[Delta]\nAP:\nAP50:\nAP75:\nFPS\u002FLatency:\nNotes:\n```\n\n## Current Status\n\n- The Agentic modules and RTMOHead integration are implemented.\n- A minimal COCO \u002F RTMO-m config is provided.\n- The code is structured for reproduction and ablation, but full trained checkpoints and training logs are not bundled in this repository.\n- Additional supervised critic losses, teacher forcing, ONNX \u002F TensorRT export, and deployment-specific benchmarking can be added as follow-up work.\n\n## Acknowledgements\n\nThis project is built on top of [OpenMMLab MMPose](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmpose) and the RTMO implementation. We appreciate the OpenMMLab community for providing a strong pose-estimation codebase and reproducible engineering infrastructure.\n\n## License\n\nThis project follows Apache License 2.0. Please also respect the original MMPose \u002F OpenMMLab license notices in `LICENSE` and `LICENSES.md`.\n","Agentic-RTMO 是一个用于实时多人姿态估计的轻量级扩展，特别适用于拥挤场景。该项目基于 OpenMMLab MMPose 的 RTMO 模型，通过引入一个简洁的 Think-Critique-Act 循环机制，在不重新运行图像骨干网络的情况下对关键点预测进行迭代修正，从而提高在复杂背景下的关节定位准确性。该循环包括解码当前关键点特征、评估不可靠关节以及更新关键点特征三个步骤。这种设计使得 Agentic-RTMO 在保持单阶段姿态估计速度优势的同时，增强了模型自我校正的能力。它非常适合需要在人群密集环境中高效准确地识别人体姿态的应用场景，如安防监控或体育赛事分析等。",2,"2026-06-11 04:03:00","CREATED_QUERY"]