[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81003":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":13,"stars30d":13,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":27,"discoverSource":28},81003,"q2rl","rai-opensource\u002Fq2rl","rai-opensource","Q-Estimation and Q-Gating from BC for RL","https:\u002F\u002Fq2rl.rai-inst.com\u002F",null,"Python",34,4,30,0,2,6,2.1,"MIT License",false,"main",true,[],"2026-06-12 02:04:09","\u003Cdiv align=\"center\">\n\n# 🍋 When Life Gives You BC, Make Q-functions:🍹 \u003Cbr>Extracting Q-values from Behavior Cloning \u003Cbr>for On-Robot Reinforcement Learning\n\n### [**Paper**](https:\u002F\u002Fq2rl.rai-inst.com\u002Fstatic\u002Fpdf\u002Fq2rl.pdf) | [**Website**](https:\u002F\u002Fq2rl.rai-inst.com\u002F)\n\n[Lakshita Dodeja](https:\u002F\u002Flakshitadodeja.github.io\u002Fwebsite\u002F)\u003Csup>1,2\u003C\u002Fsup>,\n[Ondrej Biza](https:\u002F\u002Fondrejbiza.com\u002F)\u003Csup>1\u003C\u002Fsup>,\n[Shivam Vats](https:\u002F\u002Fshivamvats.com\u002F)\u003Csup>2\u003C\u002Fsup>,\n[Stephen Hart](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fstephen-hart-3711666\u002F)\u003Csup>1\u003C\u002Fsup>, \n[Stefanie Tellex](https:\u002F\u002Fh2r.cs.brown.edu\u002Fpeople\u002F)\u003Csup>2\u003C\u002Fsup>,\n[Robin Walters](https:\u002F\u002Fwww.robinwalters.com\u002F)\u003Csup>3\u003C\u002Fsup>,\n[Karl Schmeckpeper](https:\u002F\u002Fsites.google.com\u002Fview\u002Fkarlschmeckpeper)\u003Csup>1\u003C\u002Fsup>, \n[Thomas Weng](https:\u002F\u002Fthomasweng.com\u002F)\u003Csup>1\u003C\u002Fsup>\n\n\u003Csup>1\u003C\u002Fsup>Robotics and AI Institute, \u003Csup>2\u003C\u002Fsup>Brown University, \u003Csup>3\u003C\u002Fsup>Northeastern University\n\n\n![Q2RL](Q2RL.jpg)\n\n\u003C\u002Fdiv>\n\n## Installation\n\nWe provide installation instructions with conda and uv.\n\n### Install with conda\n1. **Setup Conda Environment:**\n    ```bash\n    conda create -n q2rl python=3.10 -y\n    conda activate q2rl\n    ```\n2. **Install other requirements and JAX:**\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3. **Install D4RL and Adroit Envs:**\n\n    We use the D4RL and Adroit Envs versions from [WSRL](https:\u002F\u002Fgithub.com\u002Fzhouzypaul\u002Fwsrl) repo. Copying instructions from WSRL:\n    \n    This fork incorporates the antmaze-ultra environments and fixes the kitchen environment rewards to be consistent between the offline dataset and the environment.\n    ```\n    git clone git@github.com:zhouzypaul\u002FD4RL.git\n    cd D4RL\n    pip install -e .\n    ```\n\n    To use Mujoco, you would also need to install mujoco manually to `~\u002F.mujoco\u002F` (for more instructions on download see [here](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fmujoco-py?tab=readme-ov-file#install-mujoco)), and use the following environment variables\n    ```bash\n    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME\u002F.mujoco\u002Fmujoco210\u002Fbin\n    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:\u002Fusr\u002Flib\u002Fnvidia\n    ```\n  \n    To use the adroit envs, you would need\n    ```\n    git clone --recursive https:\u002F\u002Fgithub.com\u002Fnakamotoo\u002Fmj_envs.git\n    cd mj_envs\n    git submodule update --remote\n    pip install -e .\n    ```\n\n    Download the adroit dataset from [here](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1yUdJnGgYit94X_AvV6JJP5Y3Lx2JF30Y\u002Fview) and unzip the files into `~\u002Fadroit_data\u002F`.\n    If you would like to put the adroit datasets into another directory, use the environment variable `DATA_DIR_PREFIX` (checkout the code [here](https:\u002F\u002Fgithub.com\u002Fzhouzypaul\u002Fwsrl\u002Fblob\u002F4b5665987079934a926c10a09bd81bc3c48ea9fa\u002Fwsrl\u002Fenvs\u002Fadroit_binary_dataset.py#L7) for more details).\n    ```bash\n    export DATA_DIR_PREFIX=\u002Fpath\u002Fto\u002Fyour\u002Fdata\n    ```\n4. **Install Robomimic:**\n    \n    We extract log probabilities and entropy of GMM files in `robomimic\u002Frobomimic\u002Falgo\u002Fbc.py` included in this repo. \n    ```\n    cd robomimic \n    pip install -e .\n    ```   \n    \n    Install `numpy` and `mujocopy` with these versions if other repos change their versions\n    ```\n    pip install mujoco==3.1.6 numpy==1.26.1\n    ```\n5. Activate the conda environment before running any scripts.\n\n### Install with uv\n1. Install [`uv`](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F)\n2. Clone this repository with `git clone --recursive`.\n2. Install Mujoco with `.\u002Fscripts\u002Finstall_mujoco.sh`\n    a. Ensure that your system has the requisite Mesa development headers installed; on Ubuntu, run `sudo apt install libosmesa6-dev`\n    b. Note that you will need to export the environment variables printed by `install_mujoco.sh` to your `.bashrc` or `.zshrc`, or manually export them in your shell before running any scripts\n2. Install dependencies with `uv sync`.\n3. To use the adroit envs, you would need\n    ```\n    git clone --recursive https:\u002F\u002Fgithub.com\u002Fnakamotoo\u002Fmj_envs.git\n    cd mj_envs\n    git submodule update --remote\n    uv pip install -e .\n    ```\n\n    Download the adroit dataset from [here](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1yUdJnGgYit94X_AvV6JJP5Y3Lx2JF30Y\u002Fview) and unzip the files into `~\u002Fadroit_data\u002F`.\n    If you would like to put the adroit datasets into another directory, use the environment variable `DATA_DIR_PREFIX` (checkout the code [here](https:\u002F\u002Fgithub.com\u002Fzhouzypaul\u002Fwsrl\u002Fblob\u002F4b5665987079934a926c10a09bd81bc3c48ea9fa\u002Fwsrl\u002Fenvs\u002Fadroit_binary_dataset.py#L7) for more details).\n    ```bash\n    export DATA_DIR_PREFIX=\u002Fpath\u002Fto\u002Fyour\u002Fdata\n    ```\n4. Either run scripts with `uv run` or execute `source .venv\u002Fbin\u002Factivate` to enter the virtual environment before running any scripts.\n\n## Running Experiments\n\nAll BC policies and datasets are uploaded to huggingface [here](https:\u002F\u002Fhf.co\u002Fcollections\u002Ftheaiinstitute\u002Fq2rl). Download using `bash scripts\u002Fdownload.sh`.\n\nWe follow a similar structure to [WSRL](https:\u002F\u002Fgithub.com\u002Fzhouzypaul\u002Fwsrl). \n\nAll experiment scripts for `q2rl` and baselines are in the `experiments\u002F` directory.\nYou can modify the paths in the example scripts based on your setup. \n\nAlso export the repo to the python path `export PYTHONPATH=\u002Fpath\u002Fto\u002Fq2rl:$PYTHONPATH`. \nThe example scripts do this for you.\n\nTo kill a running experiment, find the wandb group name from `logs\u002F`, then run `pkill -f \"[wandb-group-name]\"`.\n\n\n## Citation \nIf you like our work please cite us: \n```\n@inproceedings{dodeja2026q2rl,\n  title     = {When Life Gives You BC, Make Q-functions:\n               Extracting Q-values from Behavior Cloning\n               for On-Robot Reinforcement Learning},\n  author    = {Dodeja, Lakshita and Biza, Ondrej and Vats, Shivam and\n               Hart, Stephen and Tellex, Stefanie and Walters, Robin and\n               Schmeckpeper, Karl and Weng, Thomas},\n  booktitle = {Robotics: Science and Systems (RSS)},\n  year      = {2026},\n}\n```\n\n## Credits\nThis repo is built upon the [WSRL](https:\u002F\u002Fgithub.com\u002Fzhouzypaul\u002Fwsrl) and [SERL](https:\u002F\u002Fgithub.com\u002Frail-berkeley\u002Fserl) repositories. \n\n--------\n\nThis repository is released as-is to accompany a paper submission.\nIf you find any bugs, corrections, or issues that should be resolved for anyone looking to reproduce the results in this repository, please file an issue, and we will look at it as soon as we can.\nFor other improvements, including new features, we recommend creating your own fork of the repository.\n","q2rl项目专注于从行为克隆中提取Q值，以支持机器人上的强化学习任务。其核心功能包括Q估计与Q门控技术，旨在通过利用已有的行为克隆数据来生成有效的Q函数，从而减少对大规模在线交互的需求。该方案基于Python实现，并采用了JAX等高效计算库进行加速。适用于需要在有限样本条件下快速部署强化学习模型的场景，特别是对于那些难以获取大量实际操作数据的复杂机器人控制系统而言，q2rl提供了一种成本效益高的解决方案。","2026-06-11 04:03:08","CREATED_QUERY"]