[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9853":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},9853,"dreamerv3","danijar\u002Fdreamerv3","danijar","Mastering Diverse Domains through World Models","https:\u002F\u002Fdanijar.com\u002Fdreamerv3",null,"Python",3393,555,35,37,0,10,56,193,48,30.24,"MIT License",false,"main",true,[27,28,29,30,31,32],"artificial-intelligence","general","jax","minecraft","reinforcement-learning","world-models","2026-06-12 02:02:13","# Mastering Diverse Domains through World Models\n\nA reimplementation of [DreamerV3][paper], a scalable and general reinforcement\nlearning algorithm that masters a wide range of applications with fixed\nhyperparameters.\n\n![DreamerV3 Tasks](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2111293\u002F217647148-cbc522e2-61ad-4553-8e14-1ecdc8d9438b.gif)\n\nIf you find this code useful, please reference in your paper:\n\n```\n@article{hafner2025dreamerv3,\n  title={Mastering diverse control tasks through world models},\n  author={Hafner, Danijar and Pasukonis, Jurgis and Ba, Jimmy and Lillicrap, Timothy},\n  journal={Nature},\n  pages={1--7},\n  year={2025},\n  publisher={Nature Publishing Group}\n}\n```\n\nTo learn more:\n\n- [Research paper][paper]\n- [Project website][website]\n- [Twitter summary][tweet]\n\n## DreamerV3\n\nDreamerV3 learns a world model from experiences and uses it to train an actor\ncritic policy from imagined trajectories. The world model encodes sensory\ninputs into categorical representations and predicts future representations and\nrewards given actions.\n\n![DreamerV3 Method Diagram](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2111293\u002F217355673-4abc0ce5-1a4b-4366-a08d-64754289d659.png)\n\nDreamerV3 masters a wide range of domains with a fixed set of hyperparameters,\noutperforming specialized methods. Removing the need for tuning reduces the\namount of expert knowledge and computational resources needed to apply\nreinforcement learning.\n\n![DreamerV3 Benchmark Scores](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdreamerv3\u002Fassets\u002F2111293\u002F0fe8f1cf-6970-41ea-9efc-e2e2477e7861)\n\nDue to its robustness, DreamerV3 shows favorable scaling properties. Notably,\nusing larger models consistently increases not only its final performance but\nalso its data-efficiency. Increasing the number of gradient steps further\nincreases data efficiency.\n\n![DreamerV3 Scaling Behavior](https:\u002F\u002Fuser-images.githubusercontent.com\u002F2111293\u002F217356063-0cf06b17-89f0-4d5f-85a9-b583438c98dd.png)\n\n# Instructions\n\nThe code has been tested on Linux and Mac and requires Python 3.11+.\n\n## Docker\n\nYou can either use the provided `Dockerfile` that contains instructions or\nfollow the manual instructions below.\n\n## Manual\n\nInstall [JAX][jax] and then the other dependencies:\n\n```sh\npip install -U -r requirements.txt\n```\n\nTraining script:\n\n```sh\npython dreamerv3\u002Fmain.py \\\n  --logdir ~\u002Flogdir\u002Fdreamer\u002F{timestamp} \\\n  --configs crafter \\\n  --run.train_ratio 32\n```\n\nTo reproduce results, train on the desired task using the corresponding config,\nsuch as `--configs atari --task atari_pong`.\n\nView results:\n\n```sh\npip install -U scope\npython -m scope.viewer --basedir ~\u002Flogdir --port 8000\n```\n\nScalar metrics are also writting as JSONL files.\n\n# Tips\n\n- All config options are listed in `dreamerv3\u002Fconfigs.yaml` and you can\n  override them as flags from the command line.\n- The `debug` config block reduces the network size, batch size, duration\n  between logs, and so on for fast debugging (but does not learn a good model).\n- By default, the code tries to run on GPU. You can switch to CPU or TPU using\n  the `--jax.platform cpu` flag.\n- You can use multiple config blocks that will override defaults in the\n  order they are specified, for example `--configs crafter size50m`.\n- By default, metrics are printed to the terminal, appended to a JSON lines\n  file, and written as Scope summaries. Other outputs like WandB and\n  TensorBoard can be enabled in the training script.\n- If you get a `Too many leaves for PyTreeDef` error, it means you're\n  reloading a checkpoint that is not compatible with the current config. This\n  often happens when reusing an old logdir by accident.\n- If you are getting CUDA errors, scroll up because the cause is often just an\n  error that happened earlier, such as out of memory or incompatible JAX and\n  CUDA versions. Try `--batch_size 1` to rule out an out of memory error.\n- Many environments are included, some of which require installing additional\n  packages. See the `Dockerfile` for reference.\n- To continue stopped training runs, simply run the same command line again and\n  make sure that the `--logdir` points to the same directory.\n\n# Disclaimer\n\nThis repository contains a reimplementation of DreamerV3 based on the open\nsource DreamerV2 code base. It is unrelated to Google or DeepMind. The\nimplementation has been tested to reproduce the official results on a range of\nenvironments.\n\n[jax]: https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fjax#pip-installation-gpu-cuda\n[paper]: https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.04104\n[website]: https:\u002F\u002Fdanijar.com\u002Fdreamerv3\n[tweet]: https:\u002F\u002Ftwitter.com\u002Fdanijarh\u002Fstatus\u002F1613161946223677441\n","DreamerV3是一个基于世界模型的通用强化学习算法，能够以固定超参数掌握多种应用。该项目利用JAX框架实现了一个可扩展的世界模型，该模型从经验中学习，并通过想象轨迹训练演员-评论家策略，从而在不需过多调参的情况下超越专门化方法。其核心技术包括将感官输入编码为分类表示，并根据动作预测未来状态与奖励。DreamerV3适用于需要较少专家知识和计算资源就能部署强化学习方案的各种场景，如游戏AI、机器人控制等复杂任务领域。此外，项目还展示了良好的扩展性，使用更大规模模型时不仅最终性能提升，数据效率也得到改善。",2,"2026-06-11 03:25:02","top_topic"]