[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72419":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":23,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":43,"readmeContent":44,"aiSummary":45,"trendingCount":16,"starSnapshotCount":16,"syncStatus":46,"lastSyncTime":47,"discoverSource":48},72419,"Open-Interface","AmberSahdev\u002FOpen-Interface","AmberSahdev","Control Any Computer Using LLMs.","",null,"Python",2686,273,34,21,0,3,11,61.91,"GNU General Public License v3.0",false,"main",true,[25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42],"assistant","assistant-computer-control","automation","gpt","gpt4","gpt4v","gpt4vision","linux","llm","machine-learning","macos","openai","pyautogui","pyinstaller","python","self-driving","self-driving-software","windows","2026-06-12 04:01:05","# Open Interface\n\n\u003Cpicture>\n\t\u003Cimg src=\"assets\u002Ficon.png\" align=\"right\" alt=\"Open Interface Logo\" width=\"120\" height=\"120\">\n\u003C\u002Fpicture>\n\n### Control Your Computer Using LLMs\n\nOpen Interface\n- Self-drives your computer by sending your requests to an LLM backend (GPT-4o, Gemini, etc) to figure out the required steps.\n- Automatically executes these steps by simulating keyboard and mouse input.\n- Course-corrects by sending the LLM backend updated screenshots of the progress as needed.\n\n\n\u003Cdiv align=\"center\">\n\u003Ch4>Full Autopilot for All Computers Using LLMs\u003C\u002Fh4>\n\n  [![macOS](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fmac%20os-000000?style=for-the-badge&logo=apple&logoColor=white)](https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface?tab=readme-ov-file#install)\n  [![Linux](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinux-FCC624?style=for-the-badge&logo=linux&logoColor=black)](https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface?tab=readme-ov-file#install)\n  [![Windows](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWindows-0078D6?style=for-the-badge&logo=windows&logoColor=white)](https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface?tab=readme-ov-file#install)\n  \u003Cbr>\n  [![Github All Releases](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fdownloads\u002FAmberSahdev\u002FOpen-Interface\u002Ftotal.svg)]((https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface\u002Freleases\u002Flatest))\n  ![GitHub code size in bytes](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flanguages\u002Fcode-size\u002FAmberSahdev\u002FOpen-Interface)\n  ![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAmberSahdev\u002FOpen-Interface)\n  ![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FAmberSahdev\u002FOpen-Interface) \n  [![GitHub Latest Release)](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002FAmberSahdev\u002FOpen-Interface)](https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface\u002Freleases\u002Flatest)\n\n\u003C\u002Fdiv>\n\n### \u003Cins>Demo\u003C\u002Fins> 💻\n\"Solve Today's Wordle\"\u003Cbr>\n![Solve Today's Wordle](assets\u002Fwordle_demo_2x.gif)\u003Cbr>\n*clipped, 2x*\n\n\u003Cdetails>\n    \u003Csummary>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface\u002Fblob\u002Fmain\u002FMEDIA.md#demos\">More Demos\u003C\u002Fa>\u003C\u002Fsummary>\n    \u003Cul>\n\t    \u003Cli>\n\t\t    \"Make me a meal plan in Google Docs\"\n\t\t    \u003Cimg src=\"assets\u002Fmeal_plan_demo_2x.gif\" style=\"margin: 5px; border-radius: 10px;\">\n\t    \u003C\u002Fli>\n\t    \u003Cli>\n\t\t    \"Write a Web App\"\n\t\t    \u003Cimg src=\"assets\u002Fcode_web_app_demo_2x.gif\" style=\"margin: 5px; border-radius: 10px;\">\n\t    \u003C\u002Fli>\n    \u003C\u002Ful>\n\u003C\u002Fdetails>\n\n\u003Chr>\n\n### \u003Cins>Install\u003C\u002Fins> 💽\n\u003Cdetails>\n    \u003Csummary>\u003Cimg src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002F8\u002F84\u002FApple_Computer_Logo_rainbow.svg\u002F960px-Apple_Computer_Logo_rainbow.svg.png?20250629104313\" alt=\"MacOS Logo\" width=\"13\" height=\"15\"> \u003Cb>MacOS\u003C\u002Fb>\u003C\u002Fsummary>\n    \u003Cul>\n        \u003Cli>Download the MacOS binary from the latest \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface\u002Freleases\u002Flatest\">release\u003C\u002Fa>.\u003C\u002Fli>\n        \u003Cli>Unzip the file and move Open Interface to the Applications Folder.\u003Cbr>\u003Cbr> \n            \u003Cimg src=\"assets\u002Fmacos_unzip_move_to_applications.png\" width=\"350\" style=\"border-radius: 10px;\n    border: 3px solid black;\">\n        \u003C\u002Fli>\n    \u003C\u002Ful>\n  \u003Cdetails>\n    \u003Csummary>\u003Cb>Apple Silicon M-Series Macs\u003C\u002Fb>\u003C\u002Fsummary>\n    \u003Cul>\n      \u003Cli>\n        Open Interface will ask you for Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.\u003Cbr>\n      \u003C\u002Fli>\n      \u003Cli>\n        In case it doesn't, manually add these permission via \u003Cb>System Settings\u003C\u002Fb> -> \u003Cb>Privacy and Security\u003C\u002Fb>\n        \u003Cbr>\n        \u003Cimg src=\"assets\u002Fmac_m3_accessibility.png\" width=\"400\" style=\"margin: 5px; border-radius: 10px;\n    border: 3px solid black;\">\u003Cbr>\n        \u003Cimg src=\"assets\u002Fmac_m3_screenrecording.png\" width=\"400\" style=\"margin: 5px; border-radius: 10px;\n    border: 3px solid black;\">\n      \u003C\u002Fli>\n    \u003C\u002Ful>\n  \u003C\u002Fdetails>\n  \u003Cdetails>\n    \u003Csummary>\u003Cb>Intel Macs\u003C\u002Fb>\u003C\u002Fsummary>\n    \u003Cul>\n        \u003Cli>\n            Launch the app from the Applications folder.\u003Cbr>\n            You might face the standard Mac \u003Ci>\"Open Interface cannot be opened\" error\u003C\u002Fi>.\u003Cbr>\u003Cbr>\n            \u003Cimg src=\"assets\u002Fmacos_unverified_developer.png\" width=\"200\" style=\"border-radius: 10px;\n    border: 3px solid black;\">\u003Cbr>\n            In that case, press \u003Cb>\u003Ci>\u003Cins>\"Cancel\"\u003C\u002Fins>\u003C\u002Fi>\u003C\u002Fb>.\u003Cbr>\n            Then go to \u003Cb>System Preferences -> Security and Privacy -> Open Anyway.\u003C\u002Fb>\u003Cbr>\u003Cbr>\n            \u003Cimg src=\"assets\u002Fmacos_system_preferences.png\" width=\"100\" style=\"border-radius: 10px;\n    border: 3px solid black;\"> &nbsp; \n            \u003Cimg src=\"assets\u002Fmacos_security.png\" width=\"100\" style=\"border-radius: 10px;\n    border: 3px solid black;\"> &nbsp;\n            \u003Cimg src=\"assets\u002Fmacos_open_anyway.png\" width=\"400\" style=\"border-radius: 10px;\n    border: 3px solid black;\"> \n        \u003C\u002Fli>\n        \u003Cbr>\n        \u003Cli>\n        Open Interface will also need Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.\u003Cbr>\u003Cbr>\n        \u003Cimg src=\"assets\u002Fmacos_accessibility.png\" width=\"400\" style=\"margin: 5px; border-radius: 10px;\n    border: 3px solid black;\">\u003Cbr>\n        \u003Cimg src=\"assets\u002Fmacos_screen_recording.png\" width=\"400\" style=\"margin: 5px; border-radius: 10px;\n    border: 3px solid black;\">\n        \u003C\u002Fli>\n      \u003C\u002Ful>\n\u003C\u002Fdetails>\n      \u003Cul>\n        \u003Cli>Lastly, checkout the \u003Ca href=\"#setup\">Setup\u003C\u002Fa> section to connect Open Interface to LLMs (OpenAI GPT-4V)\u003C\u002Fli>\n    \u003C\u002Ful>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cimg src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fa\u002Faf\u002FTux.png\" alt=\"Linux Logo\" width=\"18\" height=\"18\"> \u003Cb>Linux\u003C\u002Fb>\u003C\u002Fsummary>\n    \u003Cul>\n        \u003Cli>Linux binary has been tested on Ubuntu 20.04 so far.\u003C\u002Fli>\n        \u003Cli>Download the Linux zip file from the latest \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface\u002Freleases\u002Flatest\">release\u003C\u002Fa>.\u003C\u002Fli>\n        \u003Cli>\n            Extract the executable and checkout the \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface?tab=readme-ov-file#setup\">Setup\u003C\u002Fa> section to connect Open Interface to LLMs, such as OpenAI GPT-4V.\u003C\u002Fli>\n    \u003C\u002Ful>\n\u003C\u002Fdetails>\n\u003Cdetails>\n    \u003Csummary>\u003Cimg src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002F5\u002F5f\u002FWindows_logo_-_2012.svg\" alt=\"Linux Logo\" width=\"15\" height=\"15\"> \u003Cb>Windows\u003C\u002Fb>\u003C\u002Fsummary>\n    \u003Cul>\n\t\u003Cli>Windows binary has been tested on Windows 10.\u003C\u002Fli>\n\t\u003Cli>Download the Windows zip file from the latest \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface\u002Freleases\u002Flatest\">release\u003C\u002Fa>.\u003C\u002Fli>\n\t\u003Cli>Unzip the folder, move the exe to the desired location, double click to open, and voila.\u003C\u002Fli>\n\t\u003Cli>Checkout the \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface?tab=readme-ov-file#setup\">Setup\u003C\u002Fa> section to connect Open Interface to LLMs (OpenAI GPT-4V)\u003C\u002Fli>\n    \u003C\u002Ful>\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n    \u003Csummary>\u003Cimg src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002Fc\u002Fc3\u002FPython-logo-notext.svg\u002F120px-Python-logo-notext.svg.png?20250701090410\" alt=\"Python Logo\" width=\"15\" height=\"15\"> \u003Cb>Run as a Script\u003C\u002Fb>\u003C\u002Fsummary>\n    \u003Cul>\n\t  \u003Cli>Clone the repo \u003Ccode>git clone https:\u002F\u002Fgithub.com\u002FAmberSahdev\u002FOpen-Interface.git\u003C\u002Fcode>\u003C\u002Fli>\n      \u003Cli>Enter the directory \u003Ccode>cd Open-Interface\u003C\u002Fcode>\u003C\u002Fli>\n      \u003Cli>\u003Cb>Optionally\u003C\u002Fb> use a Python virtual environment \n        \u003Cul>\n          \u003Cli>Note: pyenv handles tkinter installation weirdly so you may have to debug for your own system yourself.\u003C\u002Fli>\n          \u003Cli>\u003Ccode>pyenv local 3.12.2\u003C\u002Fcode>\u003C\u002Fli>\n          \u003Cli>\u003Ccode>python -m venv .venv\u003C\u002Fcode>\u003C\u002Fli> \n          \u003Cli>\u003Ccode>source .venv\u002Fbin\u002Factivate\u003C\u002Fcode>\u003C\u002Fli>\n        \u003C\u002Ful>\n      \u003C\u002Fli>\n      \u003Cli>Install dependencies \u003Ccode>pip install -r requirements.txt\u003C\u002Fcode>\u003C\u002Fli>\n      \u003Cli>Run the app using \u003Ccode>python app\u002Fapp.py\u003C\u002Fcode>\u003C\u002Fli>\n    \u003C\u002Ful>\n\u003C\u002Fdetails>\n\n### \u003Cins id=\"setup\">Setup\u003C\u002Fins> 🛠️\n\u003Cdetails>\n    \u003Csummary>\u003Cb>Set up the OpenAI API key\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Get your OpenAI API key\n  - Open Interface needs access to GPT-4o to perform user requests. GPT-4o keys can be downloaded from your OpenAI account at [platform.openai.com\u002Fsettings\u002Forganization\u002Fapi-keys](https:\u002F\u002Fplatform.openai.com\u002Fsettings\u002Forganization\u002Fapi-keys).\n  - [Follow the steps here](https:\u002F\u002Fhelp.openai.com\u002Fen\u002Farticles\u002F8264644-what-is-prepaid-billing) to add balance to your OpenAI account. To unlock GPT-4o a minimum payment of $5 is needed.\n  - [More info](https:\u002F\u002Fhelp.openai.com\u002Fen\u002Farticles\u002F7102672-how-can-i-access-gpt-4)\n- Save the API key in Open Interface settings\n  - In Open Interface, go to the Settings menu on the top right and enter the key you received from OpenAI into the text field like so: \u003Cbr>\n  \u003Cbr>\n  \u003Cpicture>\n\t\u003Cimg src=\"assets\u002Fset_openai_api_key.png\" align=\"middle\" alt=\"Set API key in settings\" width=\"400\">\n  \u003C\u002Fpicture>\u003Cbr>\n  \u003Cbr>\n\n- After setting the API key for the first time you'll need to \u003Cb>restart the app\u003C\u002Fb>.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n    \u003Csummary>\u003Cb>Set up the Google Gemini API key\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Go to Settings -> Advanced Settings and select the Gemini model you wish to use.\n- Get your Google Gemini API key from https:\u002F\u002Faistudio.google.com\u002Fapp\u002Fapikey.\n- Save the API key in Open Interface settings.\n- Save the settings and \u003Cb>restart the app\u003C\u002Fb>.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n    \u003Csummary>\u003Cb>Optional: Setup a Custom LLM\u003C\u002Fb>\u003C\u002Fsummary>\n\n- Open Interface supports using other OpenAI API style LLMs (such as Llava) as a backend and can be configured easily in the Advanced Settings window.\n- Enter the custom base url and model name in the Advanced Settings window and the API key in the Settings window as needed. \n- NOTE - If you're using Llama:\n  - You may need to enter a random string like \"xxx\" in the API key input box.\n  - You may need to append \u002Fv1\u002F to the base URL.\n    \u003Cbr>\n    \u003Cpicture>\n      \u003Cimg src=\"assets\u002Fadvanced_settings.png\" align=\"middle\" alt=\"Set API key in settings\" width=\"400\">\n    \u003C\u002Fpicture>\u003Cbr>\n    \u003Cbr>\n- If your LLM does not support an OpenAI style API, you can use a library like [this](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm) to convert it to one.\n- You will need to restart the app after these changes.\n\n\u003C\u002Fdetails>\n\n\u003Chr>\n\n### \u003Cins>Stuff It’s Error-Prone At, For Now\u003C\u002Fins> 😬\n\n- Accurate spatial-reasoning and hence clicking buttons.\n- Keeping track of itself in tabular contexts, like Excel and Google Sheets, for similar reasons as stated above.\n- Navigating complex GUI-rich applications like Counter-Strike, Spotify, Garage Band, etc due to heavy reliance on cursor actions.\n\n\n### \u003Cins>The Future\u003C\u002Fins> 🔮\n(*with better models trained on video walkthroughs like Youtube tutorials*)\n- \"Create a couple of bass samples for me in Garage Band for my latest project.\"\n- \"Read this design document for a new feature, edit the code on Github, and submit it for review.\"\n- \"Find my friends' music taste from Spotify and create a party playlist for tonight's event.\"\n- \"Take the pictures from my Tahoe trip and make a White Lotus type montage in iMovie.\"\n\n### \u003Cins>Notes\u003C\u002Fins> 📝\n- Cost Estimation: $0.0005 - $0.002 per LLM request depending on the model used.\u003Cbr>\n(User requests can require between two to a few dozen LLM backend calls depending on the request's complexity.)\n- You can interrupt the app anytime by pressing the Stop button, or by dragging your cursor to any of the screen corners.\n- Open Interface can only see your primary display when using multiple monitors. Therefore, if the cursor\u002Ffocus is on a secondary screen, it might keep retrying the same actions as it is unable to see its progress.\n\n\u003Chr>\n\n### \u003Cins>System Diagram\u003C\u002Fins> 🖼️\n```\n+----------------------------------------------------+\n| App                                                |\n|                                                    |\n|    +-------+                                       |\n|    |  GUI  |                                       |\n|    +-------+                                       |\n|        ^                                           |\n|        |                                           |\n|        v                                           |\n|  +-----------+  (Screenshot + Goal)  +-----------+ |\n|  |           | --------------------> |           | |\n|  |    Core   |                       |    LLM    | |\n|  |           | \u003C-------------------- |  (GPT-4o) | |\n|  +-----------+    (Instructions)     +-----------+ |\n|        |                                           |\n|        v                                           |\n|  +-------------+                                   |\n|  | Interpreter |                                   |\n|  +-------------+                                   |\n|        |                                           |\n|        v                                           |\n|  +-------------+                                   |\n|  |   Executer  |                                   |\n|  +-------------+                                   |\n+----------------------------------------------------+\n```\n\n--- \n\n### \u003Cins>Star History\u003C\u002Fins> ⭐️\n\n\u003Cpicture>\n\t\u003Cimg src=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=AmberSahdev\u002FOpen-Interface&type=Date\" alt=\"Star History\" width=\"720\">\n\u003C\u002Fpicture>\n\n### \u003Cins>Links\u003C\u002Fins> 🔗\n- Check out more of my projects at [AmberSah.dev](https:\u002F\u002FAmberSah.dev).\n- Other demos and press kit can be found at [MEDIA.md](MEDIA.md).\n\n\n\u003Cdiv align=\"center\">\n\t\u003Cimg alt=\"GitHub Repo stars\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAmberSahdev\u002FOpen-Interface\">\n\t\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FAmberSahdev\"> \u003Cimg alt=\"GitHub followers\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Ffollowers\u002FAmberSahdev\"> \u003C\u002Fa>\n\u003C\u002Fdiv>\n","Open Interface 是一个利用大型语言模型（LLM）来控制计算机的项目。它通过将用户的请求发送给 LLM 后端（如 GPT-4、Gemini 等），自动计算出所需的操作步骤，并模拟键盘和鼠标输入来执行这些步骤。在必要时，它还会发送更新的屏幕截图给 LLM 以进行路径修正，确保操作准确无误。该项目使用 Python 编写，支持 Windows、macOS 和 Linux 平台。适合需要自动化日常任务或希望通过自然语言指令操控电脑的用户，例如自动化办公软件操作、编写代码等场景。",2,"2026-06-11 03:41:58","high_star"]