[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2710":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},2710,"baselines","openai\u002Fbaselines","openai","OpenAI Baselines: high-quality implementations of reinforcement learning algorithms","",null,"Python",16730,4943,627,413,0,9,19,5,45,"MIT License",false,"master",true,[],"2026-06-12 02:00:43","**Status:** Maintenance (expect bug fixes and minor updates)\n\n\u003Cimg src=\"data\u002Flogo.jpg\" width=25% align=\"right\" \u002F> [![Build status](https:\u002F\u002Ftravis-ci.org\u002Fopenai\u002Fbaselines.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002Fopenai\u002Fbaselines)\n\n# Baselines\n\nOpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.\n\nThese algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones. \n\n## Prerequisites \nBaselines requires python3 (>=3.5) with the development headers. You'll also need system packages CMake, OpenMPI and zlib. Those can be installed as follows\n### Ubuntu \n    \n```bash\nsudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev\n```\n    \n### Mac OS X\nInstallation of system packages on Mac requires [Homebrew](https:\u002F\u002Fbrew.sh). With Homebrew installed, run the following:\n```bash\nbrew install cmake openmpi\n```\n    \n## Virtual environment\nFrom the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via\n```bash\npip install virtualenv\n```\nVirtualenvs are essentially folders that have copies of python executable and all python packages.\nTo create a virtualenv called venv with python3, one runs \n```bash\nvirtualenv \u002Fpath\u002Fto\u002Fvenv --python=python3\n```\nTo activate a virtualenv: \n```\n. \u002Fpath\u002Fto\u002Fvenv\u002Fbin\u002Factivate\n```\nMore thorough tutorial on virtualenvs and options can be found [here](https:\u002F\u002Fvirtualenv.pypa.io\u002Fen\u002Fstable\u002F) \n\n\n## Tensorflow versions\nThe master branch supports Tensorflow from version 1.4 to 1.14. For Tensorflow 2.0 support, please use tf2 branch.\n\n## Installation\n- Clone the repo and cd into it:\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines.git\n    cd baselines\n    ```\n- If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases, you may use\n    ```bash \n    pip install tensorflow-gpu==1.14 # if you have a CUDA-compatible gpu and proper drivers\n    ```\n    or \n    ```bash\n    pip install tensorflow==1.14\n    ```\n    to install Tensorflow 1.14, which is the latest version of Tensorflow supported by the master branch. Refer to [TensorFlow installation guide](https:\u002F\u002Fwww.tensorflow.org\u002Finstall\u002F)\n    for more details. \n\n- Install baselines package\n    ```bash\n    pip install -e .\n    ```\n\n### MuJoCo\nSome of the baselines examples use [MuJoCo](http:\u002F\u002Fwww.mujoco.org) (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from [www.mujoco.org](http:\u002F\u002Fwww.mujoco.org)). Instructions on setting up MuJoCo can be found [here](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fmujoco-py)\n\n## Testing the installation\nAll unit tests in baselines can be run using pytest runner:\n```\npip install pytest\npytest\n```\n\n## Training models\nMost of the algorithms in baselines repo are used as follows:\n```bash\npython -m baselines.run --alg=\u003Cname of the algorithm> --env=\u003Cenvironment_id> [additional arguments]\n```\n### Example 1. PPO with MuJoCo Humanoid\nFor instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps\n```bash\npython -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7\n```\nNote that for mujoco environments fully-connected network is default, so we can omit `--network=mlp`\nThe hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:\n```bash\npython -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy\n```\nwill set entropy coefficient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same)\n\nSee docstrings in [common\u002Fmodels.py](baselines\u002Fcommon\u002Fmodels.py) for description of network parameters for each type of model, and \ndocstring for [baselines\u002Fppo2\u002Fppo2.py\u002Flearn()](baselines\u002Fppo2\u002Fppo2.py#L152) for the description of the ppo2 hyperparameters. \n\n### Example 2. DQN on Atari \nDQN with Atari is at this point a classics of benchmarks. To run the baselines implementation of DQN on Atari Pong:\n```\npython -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6\n```\n\n## Saving, loading and visualizing models\n\n### Saving and loading the model\nThe algorithms serialization API is not properly unified yet; however, there is a simple method to save \u002F restore trained models. \n`--save_path` and `--load_path` command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively. \nLet's imagine you'd like to train ppo2 on Atari Pong,  save the model and then later visualize what has it learnt.\n```bash\npython -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~\u002Fmodels\u002Fpong_20M_ppo2\n```\nThis should get to the mean reward per episode about 20. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize: \n```bash\npython -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~\u002Fmodels\u002Fpong_20M_ppo2 --play\n```\n\n*NOTE:* Mujoco environments require normalization to work properly, so we wrap them with VecNormalize wrapper. Currently, to ensure the models are saved with normalization (so that trained models can be restored and run without further training) the normalization coefficients are saved as tensorflow variables. This can decrease the performance somewhat, so if you require high-throughput steps with Mujoco and do not need saving\u002Frestoring the models, it may make sense to use numpy normalization instead. To do that, set 'use_tf=False` in [baselines\u002Frun.py](baselines\u002Frun.py#L116). \n\n### Logging and vizualizing learning curves and other training metrics\nBy default, all summary data, including progress, standard output, is saved to a unique directory in a temp folder, specified by a call to Python's [tempfile.gettempdir()](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Ftempfile.html#tempfile.gettempdir).\nThe directory can be changed with the `--log_path` command-line option.\n```bash\npython -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~\u002Fmodels\u002Fpong_20M_ppo2 --log_path=~\u002Flogs\u002FPong\u002F\n```\n*NOTE:* Please be aware that the logger will overwrite files of the same name in an existing directory, thus it's recommended that folder names be given a unique timestamp to prevent overwritten logs.\n\nAnother way the temp directory can be changed is through the use of the `$OPENAI_LOGDIR` environment variable.\n\nFor examples on how to load and display the training data, see [here](docs\u002Fviz\u002Fviz.ipynb).\n\n## Subpackages\n\n- [A2C](baselines\u002Fa2c)\n- [ACER](baselines\u002Facer)\n- [ACKTR](baselines\u002Facktr)\n- [DDPG](baselines\u002Fddpg)\n- [DQN](baselines\u002Fdeepq)\n- [GAIL](baselines\u002Fgail)\n- [HER](baselines\u002Fher)\n- [PPO1](baselines\u002Fppo1) (obsolete version, left here temporarily)\n- [PPO2](baselines\u002Fppo2) \n- [TRPO](baselines\u002Ftrpo_mpi)\n\n\n\n## Benchmarks\nResults of benchmarks on Mujoco (1M timesteps) and Atari (10M timesteps) are available \n[here for Mujoco](https:\u002F\u002Fhtmlpreview.github.com\u002F?https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines\u002Fblob\u002Fmaster\u002Fbenchmarks_mujoco1M.htm) \nand\n[here for Atari](https:\u002F\u002Fhtmlpreview.github.com\u002F?https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines\u002Fblob\u002Fmaster\u002Fbenchmarks_atari10M.htm) \nrespectively. Note that these results may be not on the latest version of the code, particular commit hash with which results were obtained is specified on the benchmarks page. \n\nTo cite this repository in publications:\n\n    @misc{baselines,\n      author = {Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai and Zhokhov, Peter},\n      title = {OpenAI Baselines},\n      year = {2017},\n      publisher = {GitHub},\n      journal = {GitHub repository},\n      howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines}},\n    }\n\n","OpenAI Baselines 是一系列高质量的强化学习算法实现。该项目提供了DQN及其变体等算法的实现，这些算法在性能上与已发表论文中的结果相当，旨在为研究者提供一个可靠的基础来复现、改进现有算法，并在此基础上探索新的想法。项目支持Python 3.5及以上版本，依赖TensorFlow（1.4至1.14版本），同时也兼容TensorFlow 2.0（需切换到tf2分支）。对于需要构建或比较新方法的研究场景来说，Baselines是一个非常合适的工具。此外，部分示例使用了MuJoCo物理模拟器，适合于进行涉及复杂物理交互环境下的强化学习研究。",2,"2026-06-11 02:50:59","top_language"]