[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72457":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":14,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":24,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":43,"readmeContent":44,"aiSummary":45,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":46,"discoverSource":47},72457,"Cradle","BAAI-Agents\u002FCradle","BAAI-Agents","The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.","https:\u002F\u002Fbaai-agents.github.io\u002FCradle\u002F",null,"Python",2538,269,28,19,0,2,8,6,70.09,"MIT License",false,"main",true,[26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42],"ai","ai-agent","ai-agents-framework","computer-control","cradle","foundation-agent","gcc","general-computer-control","generative-ai","grounding","large-language-models","llm","lmm","multimodality","personoid","vision-language-model","vlm","2026-06-12 04:01:05","# Cradle: Empowering Foundation Agents Towards General Computer Control\n\n\u003Cdiv align=\"center\">\n\n[[Website]](https:\u002F\u002Fbaai-agents.github.io\u002FCradle\u002F)\n[[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.03186)\n[[PDF]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.03186)\n\n[![Python Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10-blue.svg)]()\n[![GitHub license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMIT-blue)]()\n\n![](docs\u002Fimages\u002Fcradle-intro-cr.png)\n\n\u003C\u002Fdiv>\n\nThe Cradle framework empowers nascent foundation models to perform complex computer tasks\nvia the same unified interface humans use, i.e., screenshots as input and keyboard & mouse operations as output.\n\n## 📢 Updates\n- 2024-06-27: A major update! Cradle is extened to four games: [RDR2](https:\u002F\u002Fwww.rockstargames.com\u002Freddeadredemption2), [Stardew Valley](https:\u002F\u002Fwww.stardewvalley.net\u002F), [Cities: Skylines](https:\u002F\u002Fwww.paradoxinteractive.com\u002Fgames\u002Fcities-skylines\u002Fabout), and [Dealer's Life 2](https:\u002F\u002Fabyteentertainment.com\u002Fdealers-life-2\u002F) and various software, including but not limited to Chrome, Outlook, Capcut, Meitu and Feishu. We also release our latest [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.03186). Check it out!\n\n\u003Cdiv align=\"center\">\n\n![](docs\u002Fimages\u002Fgcc.jpg)\n\n\u003C\u002Fdiv>\n\n## Latest Videos\n\u003Cdiv align=\"center\">\n\u003Ca alt=\"Watch the video\" href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=fkkSJw1iJJ8\">\u003Cimg src=\".\u002Fdocs\u002Fenvs\u002Fimages\u002Frdr2\u002FRDR2_story_cover.jpg\" width=\"33%\" \u002F>\u003C\u002Fa>\n&nbsp;&nbsp;\n\u003Ca alt=\"Watch the video\" href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ay5gBqzPcDE\">\u003Cimg src=\".\u002Fdocs\u002Fenvs\u002Fimages\u002Frdr2\u002FRDR2_openended_cover.jpg\" width=\"33%\" \u002F>\u003C\u002Fa>\n&nbsp;&nbsp;\n\u003Ca alt=\"Watch the video\" href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=regULK_60_8\">\u003Cimg src=\".\u002Fdocs\u002Fenvs\u002Fimages\u002Fskylines\u002Fcityskyline_video_cover.png\" width=\"33%\" \u002F>\u003C\u002Fa>\n&nbsp;&nbsp;\n\u003Ca alt=\"Watch the video\" href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Kaiz4yJieUk\">\u003Cimg src=\".\u002Fdocs\u002Fenvs\u002Fimages\u002Fstardew\u002Fstardew_video_cover.png\" width=\"33%\" \u002F>\u003C\u002Fa>\n&nbsp;&nbsp;\n\u003Ca alt=\"Watch the video\" href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=WZiL_0V880M\">\u003Cimg src=\".\u002Fdocs\u002Fenvs\u002Fimages\u002Fdealers\u002Fdealer_video_cover.png\" width=\"33%\" \u002F>\u003C\u002Fa>\n&nbsp;&nbsp;\n\u003Ca alt=\"Watch the video\" href=\"https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=uWgLnZmpVTM\">\u003Cimg src=\".\u002Fdocs\u002Fenvs\u002Fimages\u002Fsoftware\u002FSoftware_cover.png\" width=\"33%\" \u002F>\u003C\u002Fa>\n&nbsp;&nbsp;\n\u003C\u002Fdiv>\n\nClick on either of the video thumbnails above to watch them on YouTube.\n\n# 💾 Installation\n\n## Prepare the Environment File\nWe currently provide access to OpenAI's and Claude's API. Please create a `.env` file in the root of the repository to store the keys (one of them is enough).\n\nSample `.env` file containing private information:\n```\nOA_OPENAI_KEY = \"abc123abc123abc123abc123abc123ab\"\nRF_CLAUDE_AK = \"abc123abc123abc123abc123abc123ab\" # Access Key for Claude\nRF_CLAUDE_SK = \"123abc123abc123abc123abc123abc12\" # Secret Access Key for Claude\nAZ_OPENAI_KEY = \"123abc123abc123abc123abc123abc12\"\nAZ_BASE_URL = \"https:\u002F\u002Fabc123.openai.azure.com\u002F\"\nRF_CLAUDE_AK = \"abc123abc123abc123abc123abc123ab\"\nRF_CLAUDE_SK = \"123abc123abc123abc123abc123abc12\"\nIDE_NAME = \"Code\"\n```\nOA_OPENAI_KEY is the OpenAI API key. You can get it from the [OpenAI](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys).\n\nAZ_OPENAI_KEY is the Azure OpenAI API key. You can get it from the [Azure Portal](https:\u002F\u002Fportal.azure.com\u002F#view\u002FHubsExtension\u002FBrowseResource\u002FresourceType\u002FMicrosoft.CognitiveServices%2Faccounts).\n\nOA_CLAUDE_KEY is the Anthropic Claude API key. You can get it from the [Anthropic](https:\u002F\u002Fconsole.anthropic.com\u002Fsettings\u002Fkeys).\n\nRF_CLAUDE_AK and RF_CLAUDE_SK are AWS Restful API key and secret key for Claude API.\n\nIDE_NAME refers to the IDE environment in which the repository's code runs, such as `PyCharm` or `Code` (VSCode). It is primarily used to enable automatic switching between the IDE and the target environment.\n\n\n## Setup\n\n### Python Environment\nPlease setup your python environment and install the required dependencies as:\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002FBAAI-Agents\u002FCradle.git\ncd Cradle\n\n# Create a new conda environment\nconda create --name cradle-dev python=3.10\nconda activate cradle-dev\npip install -r requirements.txt\n```\n\n### Install the OCR Tools\n```\n1. Option 1\n# Download best-matching version of specific model for your spaCy installation\npython -m spacy download en_core_web_lg\n\nor\n\n# pip install .tar.gz archive or .whl from path or URL\npip install https:\u002F\u002Fgithub.com\u002Fexplosion\u002Fspacy-models\u002Freleases\u002Fdownload\u002Fen_core_web_lg-3.7.1\u002Fen_core_web_lg-3.7.1.tar.gz\n\n2. Option 2\n# Copy this url https:\u002F\u002Fgithub.com\u002Fexplosion\u002Fspacy-models\u002Freleases\u002Fdownload\u002Fen_core_web_lg-3.7.1\u002Fen_core_web_lg-3.7.1.tar.gz\n# Paste it in the browser and download the file to res\u002Fspacy\u002Fdata\ncd res\u002Fspacy\u002Fdata\npip install en_core_web_lg-3.7.1.tar.gz\n```\n\n# 🚀 Get Started\nDue to the vast differences between each game and software, we have provided the specific settings for each of them below.\n1. [Red Dead Redemption 2](docs\u002Fenvs\u002Frdr2.md)\n2. [Stardew Valley](docs\u002Fenvs\u002Fstardew.md)\n3. [Cities: Skylines](docs\u002Fenvs\u002Fskylines.md)\n4. [Dealer's Life 2](docs\u002Fenvs\u002Fdealers.md)\n5. [Software](docs\u002Fenvs\u002Fsoftware.md)\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\".\u002Fdocs\u002Fimages\u002Fgames_wheel.png\" height=\"365\" \u002F> \u003Cimg src=\".\u002Fdocs\u002Fimages\u002Fapplications_wheel.png\" height=\"365\" \u002F>\n\u003C\u002Fdiv>\n\n# 🌲 File Structure\nSince some users may want to apply our framework to new games, this section primarily showcases the core directories and organizational structure of Cradle. We will highlight in \"⭐⭐⭐\" the modules related to migrating to new games, and provide detailed explanations later.\n```\nCradle\n├── cache # Cache the GroundingDino model and the bert-base-uncased model\n├── conf # ⭐⭐⭐ The configuration files for the environment and the llm model\n│   ├── env_config_dealers.json\n│   ├── env_config_rdr2_main_storyline.json\n│   ├── env_config_rdr2_open_ended_mission.json\n│   ├── env_config_skylines.json\n│   ├── env_config_stardew_cultivation.json\n│   ├── env_config_stardew_farm_clearup.json\n│   ├── env_config_stardew_shopping.json\n│   ├── openai_config.json\n│   ├── claude_config.json\n│   ├── restful_claude_config.json\n│   └── ...\n├── deps # The dependencies for the Cradle framework, ignore this folder\n├── docs # The documentation for the Cradle framework, ignore this folder\n├── res # The resources for the Cradle framework\n│   ├── models # Ignore this folder\n│   ├── tool # Subfinder for RDR2\n│   ├── [game or software] # ⭐⭐⭐ The resources for game, exmpale: rdr2, dealers, skylines, stardew, outlook, chrome, capcut, meitu, feishu\n│   │   ├── prompts # The prompts for the game\n│   │   │   └── templates\n│   │   │       ├── action_planning.prompt\n│   │   │       ├── information_gathering.prompt\n│   │   │       ├── self_reflection.prompt\n│   │   │       └── task_inference.prompt\n│   │   ├── skills # The skills json for the game, it will be generated automatically\n│   │   ├── icons # The icons difficult for GPT-4 to recognize in the game can be replaced with text for better recognition using an icon replacer\n│   │   └── saves # Save files in the game\n│   └── ...\n├── requirements.txt # The requirements for the Cradle framework\n├── runner.py # The main entry for the Cradle framework\n├── cradle # Cradle's core modules\n│   ├── config # The configuration for the Cradle framework\n│   ├── environment # The environment for the Cradle framework\n│   │   ├── [game or software] # ⭐⭐⭐ The environment for the game, exmpale: rdr2, dealers, skylines, stardew, outlook, chrome, capcut, meitu, feishu\n│   │   │   ├── __init__.py # The initialization file for the environment\n│   │   │   ├── atomic_skills # Atomic skills in the game. Users should customise them to suit the needs of the game or software, e.g. character movement\n│   │   │   ├── composite_skills # Combination skills for atomic skills in games or software\n│   │   │   ├── skill_registry.py # The skill registry for the game. Will register all atomic skills and composite skills into the registry.\n│   │   │   └── ui_control.py # The UI control for the game. Define functions to pause the game and switch to the game window\n│   │   └── ...\n│   ├── gameio # Interfaces that directly wrap the skill registry and ui control in the environment\n│   ├── log # The log for the Cradle framework\n│   ├── memory # The memory for the Cradle framework\n│   ├── module # Currently there is only the skill execution module. Later will migrate action planning, self-reflection and other modules from planner and provider\n│   ├── planner # The planner for the Cradle framework. Unified interface for action planning, self-reflection and other modules. This module will be deleted later and will be moved to the module module.\n│   ├── runner # ⭐⭐⭐ The logical flow of execution for each game and software. All game and software processes will then be unified into a single runner\n│   ├── utils # Defines some helper functions such as save json and load json\n│   └── provider # The provider for the Cradle framework. We have semantically decomposed most of the execution flow in the runner into providers\n│       ├── augment # Methods for image augmentation\n│       ├── llm # Call for the LLM model, e.g. OpenAI's GPT-4o, Claude, etc.\n│       ├── module # ⭐⭐⭐ The module for the Cradle framework. e.g., action planning, self-reflection and other modules. It will be migrated to the cradle\u002Fmodule later.\n│       ├── object_detect # Methods for object detection\n│       ├── process # ⭐⭐⭐ Methods for pre-processing and post-processing for action planning, self-reflection and other modules\n│       ├── video # Methods for video processing\n│       ├── others # Methods for other operations, e.g., save and load coordinates for skylines\n│       ├── circle_detector.py # The circle detector for the rdr2\n│       ├── icon_replacer.py # Methods for replacing icons with text\n│       ├── sam_provider.py # Segment anything for software\n│       └── ...\n└── ...\n```\n\n# 📚 Migrate to New Game\nSince each game's settings and the operating systems they are compatible with are different, Cradle cannot simply replace one game name to migrate to a new game. We suggest considering each game specifically. For example, RDR2, an independent AAA game, requires real-time combat, so we need to pause the game to wait for GPT-4o's response and then unpause the game to execute the actions. Stardew has the same issue. Other games like Dealer's Life 2 and Cities: Skylines do not have real-time requirements, so they do not need to pause. If the new game is similar to the latter, we recommend copying Cities: Skylines' implementation and following its implementation path to create the corresponding modules. Although each game may differ significantly, our Cradle framework can still achieve a unified adaptation for a game. Assuming the new game's name is **newgame**, the specific migration pipeline can be found [Migrate to New Game Guide](docs\u002Fenvs\u002Fnew_game.md). \n\n# Citation\nIf you find our work useful, please consider citing us!\n```\n@article{tan2024cradle,\n  title={Cradle: Empowering Foundation Agents towards General Computer Control},\n  author={Weihao Tan and Wentao Zhang and Xinrun Xu and Haochong Xia and Ziluo Ding and Boyu Li and Bohan Zhou and Junpeng Yue and Jiechuan Jiang and Yewen Li and Ruyi An and Molei Qin and Chuqiao Zong and Longtao Zheng and Yujie Wu and Xiaoqiang Chai and Yifei Bi and Tianbao Xie and Pengjie Gu and Xiyun Li and Ceyao Zhang and Long Tian and Chaojie Wang and Xinrun Wang and Börje F. Karlsson and Bo An and Shuicheng Yan and Zongqing Lu},\n  journal={arXiv preprint arXiv:2403.03186},\n  year={2024}\n}\n```\n[\u002F\u002F]: # (```)\n[\u002F\u002F]: # (@article{weihao2024cradle,)\n[\u002F\u002F]: # (  title     = {{Cradle: Empowering Foundation Agents towards General Computer Control}},)\n[\u002F\u002F]: # (  author    = {Weihao Tan and Wentao Zhang and Xinrun Xu and Haochong Xia and Ziluo Ding and Boyu Li and Bohan Zhou and Junpeng Yue and Jiechuan Jiang and Yewen Li and Ruyi An and Molei Qin and Chuqiao Zong and Longtao Zheng and Yujie Wu and Xiaoqiang Chai and Yifei Bi and Tianbao Xie and Pengjie Gu and Xiyun Li and Ceyao Zhang and Long Tian and Chaojie Wang and Xinrun Wang and Börje F. Karlsson and Bo An and Shuicheng Yan and Zongqing Lu},)\n[\u002F\u002F]: # (  journal   = {arXiv:2403.03186},)\n[\u002F\u002F]: # (  month     = {March},)\n[\u002F\u002F]: # (  year      = {2024},)\n[\u002F\u002F]: # (  primaryClass={cs.AI})\n[\u002F\u002F]: # (})\n[\u002F\u002F]: # (```)\n","Cradle框架是一个致力于实现通用计算机控制（GCC）的项目，旨在通过强化基础模型的推理能力、自我改进能力和技能管理，在一个标准化的环境中执行各种复杂的计算机任务。该项目利用截图作为输入，并通过键盘和鼠标操作进行输出，支持在多种软件应用及游戏中自动完成任务，如RDR2、Stardew Valley等游戏以及Chrome、Outlook等办公软件。Cradle基于Python开发，采用大语言模型技术，适用于需要自动化处理复杂交互场景的情况，比如游戏自动化测试、软件自动化操作等领域。","2026-06-11 03:42:08","high_star"]