[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2761":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},2761,"Photo-agents","jmerelnyc\u002FPhoto-agents","jmerelnyc","Autonomous self-evolving agents. Vision-grounded layered memory and self-written skills for LLM agents that operate your computer.","https:\u002F\u002Fphoto-agents.com",null,"Python",873,22,20,6,0,681,8.09,"MIT License",false,"main",[23,24,25,26,27,28,29,30,31,32],"agent-memory","ai-agents","autonomous-agents","computer-use","llm","photo-agents","photographic-memory","python","self-evolving-agents","vision-agents","2026-06-12 02:00:43","# Photo Agents\n\n\u003Cimg width=\"2688\" height=\"1520\" alt=\"hf_20260504_103619_aaebb60a-ba3e-4763-a5b2-7771293ce9d6\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fb190236d-d0cc-448f-a6eb-4c7bf4c6f7b7\" \u002F>\n\nAutonomous self-evolving **Photo Agents**. A perceive \u002F reason \u002F act framework for photo-aware agents that operate your computer the way you do.\n\n> \"100% autonomous, self-evolving agents.\"\n> [photo-agents.com](https:\u002F\u002Fphoto-agents.com)\n\n## Star History\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F?repos=jmerelnyc%2FPhoto-agents&type=date&legend=top-left\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=jmerelnyc\u002FPhoto-agents&type=date&theme=dark&legend=top-left\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=jmerelnyc\u002FPhoto-agents&type=date&legend=top-left\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=jmerelnyc\u002FPhoto-agents&type=date&legend=top-left\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n## About\n\nPhoto Agents is building the next generation of LLM-driven agents that ground in what they actually see on screen. Instead of dumping longer chat transcripts into a model and hoping for the best we treat memory the way biology does. Vision in. Bound observations stored in layers. Skills written by the agent itself from real success.\n\nThe package in this repo is the runtime that ships that idea. It runs locally so you keep ownership of your screen your data and your keys.\n\n- Website: https:\u002F\u002Fphoto-agents.com\n- X \u002F Twitter: https:\u002F\u002Fx.com\u002Fphotoagents\n\nFollow [@photo_agents](https:\u002F\u002Fx.com\u002Fphoto_agents) on X for build notes demos and the occasional rant about why text-only agents will never see your UI.\n\n## What it is\n\nPhoto Agents is a single Python package that bundles:\n\n- A streaming **agent loop** (`photoagents.core.loop.run_agent_session`) that drives any tool-calling LLM through a perceive → reason → act cycle.\n- A **multi-provider LLM router** (`photoagents.llm.router`) with first-class support for Anthropic Claude (native) OpenAI GPT (native) and a mixin failover session.\n- A **physical-execution toolset**: file I\u002FO, sandboxed code execution (Python \u002F PowerShell \u002F bash), browser automation via a Chrome DevTools Protocol bridge and a layered memory system (working \u002F global \u002F SOP \u002F session archive).\n- Pluggable **clients**: a polished Streamlit web app, a PyQt desktop app, a desktop companion and ready-to-run bots for Telegram, QQ, Feishu, WeCom and DingTalk.\n- Optional **observability** via Langfuse and a cron-style scheduler.\n\nThe whole thing is gated by a remote-validated **Photo Agents API key** so usage stays accountable.\n\n## Install\n\n```bash\npip install photoagents\n# or, with every optional client and integration\npip install \"photoagents[all]\"\n```\n\nPhoto Agents needs Python 3.10+.\n\n## Get an API key\n\nPhoto Agents requires a license key, validated against `https:\u002F\u002Fphoto-agents.com\u002Fv1\u002Fkeys\u002Fvalidate`. Sign in and create one at:\n\n> **https:\u002F\u002Fphoto-agents.com\u002Fdashboard\u002Fkeys**\n\nThen make it available to the runtime in any of these ways (checked in order):\n\n1. Environment variable: `PHOTOAGENTS_API_KEY=pk_live_...`\n2. Saved config: `~\u002F.photoagents\u002Fconfig.json` field `api_key`\n3. Interactive prompt on first run (offered to be saved automatically)\n\nA successful validation is cached for 24 hours so the gate stays fast.\n\n## LLM credentials\n\nCopy the credentials template and fill in your provider key:\n\n```bash\n# from the repo root\ncp photoagents\u002Fconfig\u002Fkeys_template.py credentials.py\n# then edit credentials.py and uncomment one of the provider configs\n```\n\nThe runtime also accepts a JSON form (`credentials.json`) with the same shape.\n\n## Run\n\n```bash\n# Interactive REPL on your terminal\npython -m photoagents\n\n# One-shot file-IO mode\npython -m photoagents --task my_task --input \"List the largest files in this directory.\"\n\n# Reflect \u002F watchdog mode (your check() function fires the next task)\npython -m photoagents --reflect photoagents\u002Fevolution\u002Fscheduler.py\n```\n\n## GUI clients\n\nPhoto Agents ships several optional frontends. Pick whichever fits your workflow:\n\n| Client                         | Launch command                                      |\n| ------------------------------ | --------------------------------------------------- |\n| Streamlit web app + webview    | `pythonw -m photoagents.cli.launcher`               |\n| Service hub (start\u002Fstop)       | `pythonw -m photoagents.cli.hub`                    |\n| Desktop app (PyQt)             | `python -m photoagents.clients.desktop_app`         |\n| Desktop companion              | `pythonw -m photoagents.clients.companion_v2`      |\n| Telegram bot                   | `python -m photoagents.clients.telegram_client`     |\n| Feishu \u002F WeCom \u002F DingTalk \u002F QQ | `python -m photoagents.clients.\u003Cfeishu|wecom|...>_client` |\n\nThe launcher and hub both call the same API key gate before starting any service, so they will refuse to launch anything if your key is missing or revoked.\n\n## On-disk state\n\n| Path                              | What lives there                                  |\n| --------------------------------- | -------------------------------------------------- |\n| `~\u002F.photoagents\u002Fconfig.json`      | API key + license validation cache                 |\n| `~\u002F.photoagents\u002Fglobal_mem.txt`   | Long-term L2 facts                                 |\n| `~\u002F.photoagents\u002Fsessions\u002F`        | L4 raw session archives                            |\n| `~\u002F.photoagents\u002Fskill_index\u002F`     | Vector index for skill \u002F SOP search                |\n| `~\u002F.photoagents\u002Ftemp\u002F`            | Per-task scratch (logs, intermediate output)       |\n\n## Project layout\n\n```\nphotoagents\u002F\n├── auth\u002F        License gate (remote-validated API key)\n├── cli\u002F         python -m photoagents, GUI launcher, service hub\n├── clients\u002F     Web \u002F desktop \u002F chat-platform frontends\n├── config\u002F      credentials.py template\n├── core\u002F        Agent loop and tool dispatcher\n├── evolution\u002F   Reflection \u002F scheduler scripts (the \"self-evolving\" loop)\n├── integrations\u002FOptional third-party hooks (Langfuse, etc.)\n├── llm\u002F         Multi-provider session router\n├── resources\u002F   System prompt, tool schema, CDP bridge, demo media\n├── skills\u002F      L3 SOPs and helper modules (browser, vision, OCR, ...)\n└── web\u002F         DOM simplifier and Chrome DevTools Protocol driver\n```\n\n## License\n\nMIT. See [LICENSE](LICENSE).\n\n## Status\n\nStatus: beta. APIs may change before 1.0.\n\n\n\n","Photo Agents 是一个自主进化代理项目，旨在通过视觉感知、推理和行动框架来操作计算机。该项目利用了基于视觉的分层记忆系统和自我编写技能，让大型语言模型（LLM）代理能够像人类一样使用计算机。其核心功能包括流式代理循环、多供应商LLM路由支持、物理执行工具集以及可插拔客户端等。技术上，它采用Python开发，并支持多种通信平台集成。适合需要高度自动化且能够根据屏幕内容进行智能决策的应用场景，如个人助理、自动化测试或复杂任务处理等。",2,"2026-06-11 02:51:07","CREATED_QUERY"]