[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83581":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":10,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},83581,"awesome-agent-harness","Picrew\u002Fawesome-agent-harness","Picrew","An awesome list of Agent Harness engineering resources, including GitHub projects, tools, benchmarks, and practical guides.","https:\u002F\u002Fpicrew.github.io\u002FLLM-Harness\u002F",null,"Python",1157,95,1,3,0,71,213,98.05,false,"main",true,[24,25,26],"agent-harness","context-engineering","harness-engineering","2026-06-12 04:01:41","# Awesome Agent Harness\n\nA curated, implementation-first list of **agent harness engineering** resources, with GitHub projects as the primary focus.\n\n- Total entries: **268**\n- GitHub entries: **241 (89.9%)**\n- GitHub in project categories (excluding readings): **236\u002F236 (100.0%)**\n- Categories: **9**\n- Last verified: **2026-06-05**\n- Language: [English](.\u002FREADME.md) | [中文](.\u002FREADME_zh.md)\n\n\u003Ca id=\"featured-harness-blogs\">\u003C\u002Fa>\n## Featured Harness Blogs\n\n- [Scaling Managed Agents: Decoupling the brain from the hands](https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Fmanaged-agents): Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.\n- [What We Learned Building Cloud Agents](https:\u002F\u002Fcognition.ai\u002Fblog\u002Fwhat-we-learned-building-cloud-agents): Cognition's field report on secure cloud-agent infrastructure, VM isolation, full-state snapshots, orchestration, governance, integrations, and enterprise adoption.\n- [Claude Code auto mode](https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Fclaude-code-auto-mode): Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.\n- [Harness engineering (OpenAI)](https:\u002F\u002Fopenai.com\u002Findex\u002Fharness-engineering\u002F): Field report on building reliable agent-first software via harness constraints and verification.\n- [The next evolution of the Agents SDK](https:\u002F\u002Fopenai.com\u002Findex\u002Fthe-next-evolution-of-the-agents-sdk\u002F): OpenAI's product and engineering post on model-native agent harnesses, native sandbox execution, manifests, memory, and filesystem and shell tools.\n- [Building Effective AI Agents](https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Fbuilding-effective-agents): Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.\n- [Writing effective tools for AI agents](https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Fwriting-tools-for-agents): Best practices for tool interface design so agents call tools safely and reliably.\n- [Effective harnesses for long-running agents](https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Feffective-harnesses-for-long-running-agents): Practical guide to maintaining state, resumability, and reliability over long agent runs.\n- [Harness design for long-running application development](https:\u002F\u002Fwww.anthropic.com\u002Fengineering\u002Fharness-design-long-running-apps): Follow-up article on improving long-running app generation through harness structure.\n- [Improving Deep Agents with harness engineering](https:\u002F\u002Fblog.langchain.com\u002Fimproving-deep-agents-with-harness-engineering\u002F): Evidence that harness improvements alone can move benchmark performance.\n- [Evaluating Deep Agents: Our Learnings](https:\u002F\u002Fblog.langchain.com\u002Fevaluating-deep-agents-our-learnings\u002F): LangChain's practical lessons on evaluating stateful and long-horizon agents.\n- [Your Agent Needs a Harness, Not a Framework](https:\u002F\u002Fwww.inngest.com\u002Fblog\u002Fyour-agent-needs-a-harness-not-a-framework): Argument for reliability-first infrastructure around agents instead of framework-only thinking.\n\n## Contents\n\n- [Category Overview](#category-overview)\n- [Featured Harness Blogs](#featured-harness-blogs)\n- [Catalog](#catalog)\n  - [Harness Architecture & Orchestration](#harness-architecture-orchestration)\n  - [Context & Working-State Engineering](#context-working-state-engineering)\n  - [Execution Substrates & Sandboxing](#execution-substrates-sandboxing)\n  - [Protocols, Tool Interfaces & Agent Contracts](#protocols-tool-interfaces-agent-contracts)\n  - [Evaluation Harnesses & Benchmarks](#evaluation-harnesses-benchmarks)\n  - [Observability & Reliability Operations](#observability-reliability-operations)\n  - [Guardrails, Security & Governance](#guardrails-security-governance)\n  - [Reference Harness Implementations](#reference-harness-implementations)\n  - [Essential Readings & Ecosystem Maps](#essential-readings-ecosystem-maps)\n- [Maintenance Notes](#maintenance-notes)\n- [Citation](#citation)\n\n## Category Overview\n\n| Category | Entries |\n| --- | ---: |\n| Harness Architecture & Orchestration | 44 |\n| Context & Working-State Engineering | 16 |\n| Execution Substrates & Sandboxing | 25 |\n| Protocols, Tool Interfaces & Agent Contracts | 23 |\n| Evaluation Harnesses & Benchmarks | 27 |\n| Observability & Reliability Operations | 14 |\n| Guardrails, Security & Governance | 19 |\n| Reference Harness Implementations | 68 |\n| Essential Readings & Ecosystem Maps | 32 |\n\n## Catalog\n\nNotes:\n- `Stars` are rendered as badges from snapshot values.\n- Repository update dates are tracked in `data\u002Fprojects.yaml` and validation reports.\n- Entries are sorted by stars (descending) within each category.\n\n\u003Ca id=\"harness-architecture-orchestration\">\u003C\u002Fa>\n### Harness Architecture & Orchestration\n\n| Project | Link | Stars | Tags | Summary |\n| --- | --- | --- | --- | --- |\n| Superpowers | [GitHub](https:\u002F\u002Fgithub.com\u002Fobra\u002Fsuperpowers) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-218245-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fobra\u002Fsuperpowers) | skills, workflow, cross-agent | Cross-agent software development methodology built from composable skills, mandatory workflows, worktrees, planning, TDD, review, and subagent execution. |\n| ECC | [GitHub](https:\u002F\u002Fgithub.com\u002Faffaan-m\u002FECC) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-207572-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Faffaan-m\u002FECC) | cross-harness, hooks, skills | Cross-harness operator system combining skills, hooks, memory optimization, security scanning, and validation workflows for agentic work. |\n| gstack | [GitHub](https:\u002F\u002Fgithub.com\u002Fgarrytan\u002Fgstack) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-107191-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fgarrytan\u002Fgstack) | skills, qa, release | Claude Code and cross-agent skill stack that turns product planning, architecture review, QA, security, release, and retrospectives into repeatable agent workflows. |\n| DeerFlow | [GitHub](https:\u002F\u002Fgithub.com\u002Fbytedance\u002Fdeer-flow) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-70487-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fbytedance\u002Fdeer-flow) | long-horizon, memory, subagents | Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes. |\n| oh-my-openagent | [GitHub](https:\u002F\u002Fgithub.com\u002Fcode-yeongyu\u002Foh-my-openagent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-61082-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fcode-yeongyu\u002Foh-my-openagent) | multi-harness, team-mode, skills | Multi-harness agent OS for OpenCode, Codex, Claude Code, and other coding agents with team-mode orchestration, background agents, MCPs, and skills. |\n| AutoGen | [GitHub](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fautogen) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-58702-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fautogen) | multi-agent, orchestration, framework | Programming framework for agentic AI with multi-agent interaction and orchestration. |\n| Ruflo | [GitHub](https:\u002F\u002Fgithub.com\u002Fruvnet\u002Fruflo) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-57931-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fruvnet\u002Fruflo) | multi-agent, swarm, mcp | Multi-agent orchestration platform for Claude Code with swarms, persistent memory, federation, plugins, and MCP hooks. |\n| CrewAI | [GitHub](https:\u002F\u002Fgithub.com\u002FcrewAIInc\u002FcrewAI) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-52865-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FcrewAIInc\u002FcrewAI) | multi-agent, workflows, control-plane | Multi-agent automation framework with production Flows, autonomous Crews, event-driven control, tracing, guardrails, memory, and human review hooks. |\n| Addy's Agent Skills | [GitHub](https:\u002F\u002Fgithub.com\u002Faddyosmani\u002Fagent-skills) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-48334-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Faddyosmani\u002Fagent-skills) | skills, quality-gates, coding-agents | Production-grade engineering skills for coding agents that package lifecycle workflows, quality gates, reviews, testing, debugging, security, and release practices. |\n| Agno | [GitHub](https:\u002F\u002Fgithub.com\u002Fagno-agi\u002Fagno) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-40530-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fagno-agi\u002Fagno) | scale, runtime, management | Agent software runtime focused on running and managing agentic systems at scale. |\n| LangGraph | [GitHub](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flanggraph) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-33927-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flanggraph) | graph, workflow, runtime | Graph-based runtime for resilient stateful agents and deterministic workflow control. |\n| Semantic Kernel | [GitHub](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fsemantic-kernel) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-28056-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fsemantic-kernel) | enterprise, orchestration, plugins | Enterprise-grade agentic application framework with orchestration and plugin patterns. |\n| OpenAI Agents SDK (Python) | [GitHub](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-agents-python) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-26928-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-agents-python) | sdk, handoff, workflows | Lightweight framework for multi-agent workflows, handoffs, and production patterns. |\n| Symphony | [GitHub](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fsymphony) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-25048-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fsymphony) | orchestration, control-plane, workflows | Ticket-driven orchestration layer that turns project work into isolated autonomous implementation runs. |\n| deepagents | [GitHub](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Fdeepagents) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-23913-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Fdeepagents) | runtime, orchestration, long-running | Open-source harness for long-running, tool-using agents with planning and subagent patterns. |\n| Archon | [GitHub](https:\u002F\u002Fgithub.com\u002Fcoleam00\u002FArchon) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-22182-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fcoleam00\u002FArchon) | workflow-engine, worktrees, validation | Workflow engine for AI coding agents with YAML-defined phases, isolated worktrees, and validation gates. |\n| Google ADK (Python) | [GitHub](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fadk-python) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-19991-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fadk-python) | toolkit, deployment, evaluation | Code-first toolkit to build, evaluate, and deploy advanced AI agents. |\n| elizaOS | [GitHub](https:\u002F\u002Fgithub.com\u002FelizaOS\u002Feliza) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-18511-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FelizaOS\u002Feliza) | agent-os, plugins, benchmarks | Extensible agent runtime and operating system with CLI scaffolding, agent loop, plugins, memory\u002Fstate primitives, dashboards, connectors, and benchmark suites. |\n| PydanticAI | [GitHub](https:\u002F\u002Fgithub.com\u002Fpydantic\u002Fpydantic-ai) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-17533-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fpydantic\u002Fpydantic-ai) | python, typing, schema | Type-safe Python framework for agents with strong schema contracts and tooling. |\n| Gas Town | [GitHub](https:\u002F\u002Fgithub.com\u002Fgastownhall\u002Fgastown) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-15742-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fgastownhall\u002Fgastown) | multi-agent, workspaces, coordination | Multi-agent workspace manager for coordinating coding agents with persistent work tracking, git-backed hooks, handoffs, supervision, and merge queues. |\n| Microsoft Agent Framework | [GitHub](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fagent-framework) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-11050-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fagent-framework) | multi-agent, workflows, observability | Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability. |\n| Hive | [GitHub](https:\u002F\u002Fgithub.com\u002Faden-hive\u002Fhive) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-10493-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Faden-hive\u002Fhive) | harness, orchestration, runtime | Outcome-driven agent runtime harness with explicit control loops and orchestration blocks. |\n| VoltAgent | [GitHub](https:\u002F\u002Fgithub.com\u002FVoltAgent\u002Fvoltagent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-9372-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FVoltAgent\u002Fvoltagent) | typescript, platform, runtime | TypeScript agent engineering platform built around open runtime abstractions. |\n| mcp-agent | [GitHub](https:\u002F\u002Fgithub.com\u002Flastmile-ai\u002Fmcp-agent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-8353-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flastmile-ai\u002Fmcp-agent) | mcp, runtime, workflow | Practical agent framework centered on MCP tool ecosystems and workflow composition. |\n| PraisonAI | [GitHub](https:\u002F\u002Fgithub.com\u002FMervinPraison\u002FPraisonAI) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-8062-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FMervinPraison\u002FPraisonAI) | multi-agent, workflow, memory | Multi-agent workforce framework with autonomous planning, execution, memory, RAG, dashboards, and multi-provider model support. |\n| Agent Squad | [GitHub](https:\u002F\u002Fgithub.com\u002F2FastLabs\u002Fagent-squad) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-7646-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002F2FastLabs\u002Fagent-squad) | routing, multi-agent, context | Multi-agent orchestration framework that routes requests, preserves conversation context, supports Python\u002FTypeScript, and coordinates specialist agents. |\n| Yao | [GitHub](https:\u002F\u002Fgithub.com\u002FYaoApp\u002Fyao) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-7541-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FYaoApp\u002Fyao) | single-binary, runtime, autonomous | Single-binary runtime for defining and running autonomous agents. |\n| Open Multi-Agent | [GitHub](https:\u002F\u002Fgithub.com\u002Fopen-multi-agent\u002Fopen-multi-agent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-6325-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopen-multi-agent\u002Fopen-multi-agent) | multi-agent, dag, tracing | TypeScript-native multi-agent orchestrator that turns goals into task DAGs with parallel execution, MCP integration, and live tracing. |\n| Strands Agents | [GitHub](https:\u002F\u002Fgithub.com\u002Fstrands-agents\u002Fsdk-python) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-6027-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fstrands-agents\u002Fsdk-python) | sdk, mcp, tools | Model-driven agent SDK and monorepo with Python\u002FTypeScript agent loops, provider adapters, tools, MCP integration, multi-agent systems, and streaming. |\n| Cloudflare Agents | [GitHub](https:\u002F\u002Fgithub.com\u002Fcloudflare\u002Fagents) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-5038-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fcloudflare\u002Fagents) | platform, deployment, runtime | Platform runtime for building and deploying agents with production infrastructure primitives. |\n| Flue | [GitHub](https:\u002F\u002Fgithub.com\u002Fwithastro\u002Fflue) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-4411-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fwithastro\u002Fflue) | typescript, headless, sandbox | TypeScript harness framework for building headless agents with sessions, tools, skills, and pluggable sandboxes. |\n| Embabel Agent Framework | [GitHub](https:\u002F\u002Fgithub.com\u002Fembabel\u002Fembabel-agent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3563-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fembabel\u002Fembabel-agent) | jvm, planning, typed-flows | JVM agent framework for typed agentic flows with goals, actions, conditions, dynamic planning, platform modes, and testability. |\n| OpenAI Agents SDK (JS\u002FTS) | [GitHub](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-agents-js) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3174-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-agents-js) | typescript, workflows, sandbox-agents | JavaScript\u002FTypeScript framework for multi-agent workflows with handoffs, tools, guardrails, sessions, tracing, and sandbox agents. |\n| Docker Agent | [GitHub](https:\u002F\u002Fgithub.com\u002Fdocker\u002Fdocker-agent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2985-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fdocker\u002Fdocker-agent) | docker, runtime, container | Agent builder and runtime stack emphasizing container-native execution. |\n| NeMo Agent Toolkit | [GitHub](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNeMo-Agent-Toolkit) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2379-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNeMo-Agent-Toolkit) | multi-agent, optimization, toolkit | Open toolkit for connecting and optimizing teams of AI agents. |\n| Apache Burr | [GitHub](https:\u002F\u002Fgithub.com\u002Fapache\u002Fburr) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2019-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fapache\u002Fburr) | state-machine, persistence, tracing | State-machine framework for decision-making agents and LLM apps with persistence, telemetry UI, tracing, and framework-agnostic execution. |\n| Scion | [GitHub](https:\u002F\u002Fgithub.com\u002FGoogleCloudPlatform\u002Fscion) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1581-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FGoogleCloudPlatform\u002Fscion) | multi-agent, containers, orchestration | Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes. |\n| deepagentsjs | [GitHub](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Fdeepagentsjs) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1300-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Fdeepagentsjs) | typescript, langgraph, subagents | TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks. |\n| oh-my-agent | [GitHub](https:\u002F\u002Fgithub.com\u002Ffirst-fluke\u002Foh-my-agent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1060-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Ffirst-fluke\u002Foh-my-agent) | multi-agent, skills, cross-runtime | Portable multi-agent harness that projects shared agents, skills, workflows, and rules into multiple coding-agent runtimes. |\n| Chorus | [GitHub](https:\u002F\u002Fgithub.com\u002FChorus-AIDLC\u002FChorus) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-950-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FChorus-AIDLC\u002FChorus) | ai-dlc, permissions, task-state | AI-human collaboration harness for session lifecycle, task state, sub-agent orchestration, observability, and recovery. |\n| Pydantic AI Harness | [GitHub](https:\u002F\u002Fgithub.com\u002Fpydantic\u002Fpydantic-ai-harness) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-514-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fpydantic\u002Fpydantic-ai-harness) | capabilities, hooks, pydantic | Official Pydantic AI capability library for composing tools, lifecycle hooks, instructions, and model settings into reusable agent harnesses. |\n| Water | [GitHub](https:\u002F\u002Fgithub.com\u002Fmanthanguptaa\u002Fwater) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-290-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmanthanguptaa\u002Fwater) | python, framework, approval-gates | Python agent harness framework for orchestration, resilience, observability, guardrails, approval gates, sandboxing, and deployment. |\n| OmniCoreAgent | [GitHub](https:\u002F\u002Fgithub.com\u002Fomnirexflora-labs\u002Fomnicoreagent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-241-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fomnirexflora-labs\u002Fomnicoreagent) | python, mcp, serving | Python production harness with model loop, tools, MCP, memory, workspace files, guardrails, events, subagents, background tasks, and REST\u002FSSE serving. |\n| hankweave | [GitHub](https:\u002F\u002Fgithub.com\u002FSouthBridgeAI\u002Fhankweave-runtime) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-125-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FSouthBridgeAI\u002Fhankweave-runtime) | long-horizon, runtime, checkpoints | Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals. |\n\n\u003Ca id=\"context-working-state-engineering\">\u003C\u002Fa>\n### Context & Working-State Engineering\n\n| Project | Link | Stars | Tags | Summary |\n| --- | --- | --- | --- | --- |\n| claude-mem | [GitHub](https:\u002F\u002Fgithub.com\u002Fthedotmack\u002Fclaude-mem) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-80724-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fthedotmack\u002Fclaude-mem) | memory, context, session | Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs. |\n| Beads | [GitHub](https:\u002F\u002Fgithub.com\u002Fgastownhall\u002Fbeads) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-24362-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fgastownhall\u002Fbeads) | memory, issue-tracking, work-state | Agent-optimized distributed issue tracker that stores long-horizon coding work as dependency-aware graph state with memory recall and multi-branch sync. |\n| planning-with-files | [GitHub](https:\u002F\u002Fgithub.com\u002FOthmanAdi\u002Fplanning-with-files) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-22717-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FOthmanAdi\u002Fplanning-with-files) | planning, skills, persistence | Skill package for persistent file-based planning in coding-agent workflows. |\n| agentmemory | [GitHub](https:\u002F\u002Fgithub.com\u002Frohitg00\u002Fagentmemory) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-21267-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Frohitg00\u002Fagentmemory) | memory, mcp, hooks | Persistent memory server for coding agents using hooks, MCP\u002FREST integration, hybrid search, and shared session recall. |\n| Context Mode | [GitHub](https:\u002F\u002Fgithub.com\u002Fmksglu\u002Fcontext-mode) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-16439-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmksglu\u002Fcontext-mode) | context, mcp, session | MCP context optimization server that sandboxes tool output, indexes session events, and restores continuity across agent compactions. |\n| Agent Skills for Context Engineering | [GitHub](https:\u002F\u002Fgithub.com\u002Fmuratcankoylan\u002FAgent-Skills-for-Context-Engineering) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-16339-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmuratcankoylan\u002FAgent-Skills-for-Context-Engineering) | skills, context, production | Large skill library oriented around context engineering and production agents. |\n| Trellis | [GitHub](https:\u002F\u002Fgithub.com\u002Fmindfold-ai\u002FTrellis) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-9447-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmindfold-ai\u002FTrellis) | specs, memory, workflow | Multi-platform coding-agent workflow framework with task context, project memory, and spec injection. |\n| Context-Engineering Handbook | [GitHub](https:\u002F\u002Fgithub.com\u002Fdavidkimai\u002FContext-Engineering) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-9049-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fdavidkimai\u002FContext-Engineering) | context-engineering, handbook, practices | First-principles handbook focused on practical context engineering for agent systems. |\n| CCPM | [GitHub](https:\u002F\u002Fgithub.com\u002Fautomazeio\u002Fccpm) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-8169-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fautomazeio\u002Fccpm) | planning, github-issues, parallel-execution | Spec-driven project-manager skill that turns PRDs and GitHub issues into persistent context and parallel agent execution. |\n| TencentDB Agent Memory | [GitHub](https:\u002F\u002Fgithub.com\u002FTencent\u002FTencentDB-Agent-Memory) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-4957-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FTencent\u002FTencentDB-Agent-Memory) | memory, context-offloading, openclaw | Local agent memory plugin combining symbolic short-term state, layered long-term memory, traceability, and OpenClaw\u002FHermes integrations. |\n| Acontext | [GitHub](https:\u002F\u002Fgithub.com\u002Fmemodb-io\u002FAcontext) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3510-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmemodb-io\u002FAcontext) | skills, memory, progressive-disclosure | Skill-memory layer that distills agent runs into inspectable skill files and recalls them through agent-controlled tools. |\n| Awesome Context Engineering | [GitHub](https:\u002F\u002Fgithub.com\u002FMeirtz\u002FAwesome-Context-Engineering) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3169-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FMeirtz\u002FAwesome-Context-Engineering) | awesome-list, context, survey | Survey-style list for context engineering resources and frameworks. |\n| agentic-stack | [GitHub](https:\u002F\u002Fgithub.com\u002Fcodejunkie99\u002Fagentic-stack) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2072-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fcodejunkie99\u002Fagentic-stack) | cross-harness, memory, skills | Portable memory, skills, protocols, and dashboard layer that keeps state across multiple coding-agent harnesses. |\n| context-space | [GitHub](https:\u002F\u002Fgithub.com\u002Fcontext-space\u002Fcontext-space) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-810-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fcontext-space\u002Fcontext-space) | context, infrastructure, mcp | Infrastructure project focused on context engineering building blocks and MCP-centric integrations. |\n| Memorix | [GitHub](https:\u002F\u002Fgithub.com\u002FAVIDS2\u002Fmemorix) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-499-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FAVIDS2\u002Fmemorix) | memory, mcp, cross-agent | Local-first cross-agent memory control plane with MCP support, workspace sync, sessions, and orchestration state. |\n| sd0x-dev-flow | [GitHub](https:\u002F\u002Fgithub.com\u002Fsd0xdev\u002Fsd0x-dev-flow) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-164-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fsd0xdev\u002Fsd0x-dev-flow) | hooks, state-machine, claude-code | Claude Code harness layer with hook-enforced dual review, durable state-machine gates, context-compaction recovery, and fail-closed safety. |\n\n\u003Ca id=\"execution-substrates-sandboxing\">\u003C\u002Fa>\n### Execution Substrates & Sandboxing\n\n| Project | Link | Stars | Tags | Summary |\n| --- | --- | --- | --- | --- |\n| Daytona | [GitHub](https:\u002F\u002Fgithub.com\u002Fdaytonaio\u002Fdaytona) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-72504-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fdaytonaio\u002Fdaytona) | sandbox, execution, infra | Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs. |\n| CUA | [GitHub](https:\u002F\u002Fgithub.com\u002Ftrycua\u002Fcua) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-17660-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Ftrycua\u002Fcua) | computer-use, sandbox, infra | Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support. |\n| Browser Harness | [GitHub](https:\u002F\u002Fgithub.com\u002Fbrowser-use\u002Fbrowser-harness) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-14362-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fbrowser-use\u002Fbrowser-harness) | browser, cdp, self-healing | Thin editable CDP harness that connects LLMs directly to real browsers and lets agents extend helpers in flight. |\n| E2B | [GitHub](https:\u002F\u002Fgithub.com\u002Fe2b-dev\u002FE2B) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-12480-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fe2b-dev\u002FE2B) | cloud-sandbox, execution, enterprise | Secure cloud environments with real tools for production-grade agent execution. |\n| OpenSandbox | [GitHub](https:\u002F\u002Fgithub.com\u002Falibaba\u002FOpenSandbox) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-11289-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Falibaba\u002FOpenSandbox) | sandbox, security, runtime | Secure and extensible sandbox runtime built for agent workloads. |\n| OpenShell | [GitHub](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FOpenShell) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-6786-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FOpenShell) | sandbox, policy, runtime | Safe private runtime for autonomous agents with sandbox lifecycle control and declarative filesystem, network, process, and inference policies. |\n| Microsandbox | [GitHub](https:\u002F\u002Fgithub.com\u002Fsuperradcompany\u002Fmicrosandbox) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-6411-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fsuperradcompany\u002Fmicrosandbox) | sandbox, vm, mcp | Rootless local VM sandbox runtime with SDKs, detached long-running sessions, agent skills, and MCP server integration. |\n| CubeSandbox | [GitHub](https:\u002F\u002Fgithub.com\u002FTencentCloud\u002FCubeSandbox) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-6134-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FTencentCloud\u002FCubeSandbox) | microvm, sandbox, e2b-compatible | MicroVM-based sandbox service for AI agents with sub-60ms startup, E2B-compatible APIs, and hardware-level isolation. |\n| Sandcastle | [GitHub](https:\u002F\u002Fgithub.com\u002Fmattpocock\u002Fsandcastle) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-5746-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmattpocock\u002Fsandcastle) | sandbox, typescript, branch-strategy | TypeScript library for orchestrating coding agents inside isolated sandboxes with configurable branch strategies. |\n| agent-infra sandbox | [GitHub](https:\u002F\u002Fgithub.com\u002Fagent-infra\u002Fsandbox) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-4908-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fagent-infra\u002Fsandbox) | all-in-one, browser, shell | All-in-one sandbox combining browser, shell, files, MCP, and IDE server. |\n| Judge0 | [GitHub](https:\u002F\u002Fgithub.com\u002Fjudge0\u002Fjudge0) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-4200-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fjudge0\u002Fjudge0) | code-execution, sandbox, backend | Scalable sandboxed code execution system usable as an agent execution backend. |\n| Agent Sandbox | [GitHub](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Fagent-sandbox) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2757-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Fagent-sandbox) | kubernetes, sandbox, stateful | Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support. |\n| stakpak\u002Fagent | [GitHub](https:\u002F\u002Fgithub.com\u002Fstakpak\u002Fagent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1585-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fstakpak\u002Fagent) | always-on, autonomous, ops | Always-on open agent that runs on your machines with autonomous operational loops. |\n| Sandbox Agent | [GitHub](https:\u002F\u002Fgithub.com\u002Frivet-dev\u002Fsandbox-agent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1421-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Frivet-dev\u002Fsandbox-agent) | sandbox, coding-agents, session-schema | HTTP\u002FSSE control server for running coding agents inside sandboxes with normalized sessions, permissions, event streaming, and replay. |\n| OSS-Fuzz Gen | [GitHub](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Foss-fuzz-gen) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1402-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Foss-fuzz-gen) | fuzzing, security, execution | LLM-powered fuzzing workflows integrated with controlled execution contexts. |\n| E2B Desktop Sandbox | [GitHub](https:\u002F\u002Fgithub.com\u002Fe2b-dev\u002Fdesktop) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1401-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fe2b-dev\u002Fdesktop) | desktop, sandbox, computer-use | Secure virtual desktop sandbox for computer-use agents with SDK control and screen streaming. |\n| AgentBay SDK | [GitHub](https:\u002F\u002Fgithub.com\u002Fagentbay-ai\u002Fwuying-agentbay-sdk) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1129-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fagentbay-ai\u002Fwuying-agentbay-sdk) | cloud-sandbox, computer-use, sdk | Cloud sandbox SDK for agents spanning browser, desktop, mobile, and code execution environments. |\n| Tensorlake | [GitHub](https:\u002F\u002Fgithub.com\u002Ftensorlakeai\u002Ftensorlake) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-931-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Ftensorlakeai\u002Ftensorlake) | microvm, sandbox, orchestration | Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration. |\n| AgentScope Runtime | [GitHub](https:\u002F\u002Fgithub.com\u002Fagentscope-ai\u002Fagentscope-runtime) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-807-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fagentscope-ai\u002Fagentscope-runtime) | runtime, sandbox, deployment | Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services. |\n| SWE-ReX | [GitHub](https:\u002F\u002Fgithub.com\u002FSWE-agent\u002FSWE-ReX) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-519-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FSWE-agent\u002FSWE-ReX) | sandbox, execution, coding-agent | Sandboxed execution infrastructure for AI coding agents at local and cloud scale. |\n| sandboxed.sh | [GitHub](https:\u002F\u002Fgithub.com\u002FTh0rgal\u002Fsandboxed.sh) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-448-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FTh0rgal\u002Fsandboxed.sh) | self-hosted, isolation, orchestrator | Self-hosted orchestrator running coding agents inside isolated Linux workspaces. |\n| Capsule | [GitHub](https:\u002F\u002Fgithub.com\u002Fcapsulerun\u002Fcapsule) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-288-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fcapsulerun\u002Fcapsule) | wasm, sandbox, task-runtime | Durable runtime that coordinates agent tasks inside isolated WebAssembly sandboxes with retries and lifecycle tracking. |\n| agentbox | [GitHub](https:\u002F\u002Fgithub.com\u002Fmattolson\u002Fagent-sandbox) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-176-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmattolson\u002Fagent-sandbox) | sandbox, coding-agents, network-policy | Locked-down local sandbox for AI coding agents with scoped filesystem access, egress policy, secret injection, firewalling, and persistent agent state. |\n| HexAgent | [GitHub](https:\u002F\u002Fgithub.com\u002FUnicomAI\u002Fhexagent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-125-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FUnicomAI\u002Fhexagent) | computer-layer, sandbox, runtime | Agent harness that separates the runtime from the computer it operates on through local, VM, and cloud sandbox backends. |\n| terminal-bench-env | [GitHub](https:\u002F\u002Fgithub.com\u002Fucsb-mlsec\u002Fterminal-bench-env) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-83-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fucsb-mlsec\u002Fterminal-bench-env) | terminal, benchmark-env, sandbox | Environment layer for terminal-agent benchmark execution. |\n\n\u003Ca id=\"protocols-tool-interfaces-agent-contracts\">\u003C\u002Fa>\n### Protocols, Tool Interfaces & Agent Contracts\n\n| Project | Link | Stars | Tags | Summary |\n| --- | --- | --- | --- | --- |\n| Anthropic Agent Skills | [GitHub](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fskills) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-146705-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fskills) | skills, spec, claude | Official Agent Skills repository containing the skills specification, templates, and reference skill implementations for Claude. |\n| GitHub Spec Kit | [GitHub](https:\u002F\u002Fgithub.com\u002Fgithub\u002Fspec-kit) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-108749-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fgithub\u002Fspec-kit) | spec-driven, workflows, tooling | Toolkit for spec-driven development to guide deterministic agent execution. |\n| MCP Servers | [GitHub](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Fservers) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-86755-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Fservers) | mcp, servers, implementations | Official collection of MCP server implementations across tools and domains. |\n| Chrome DevTools MCP | [GitHub](https:\u002F\u002Fgithub.com\u002FChromeDevTools\u002Fchrome-devtools-mcp) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-42861-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FChromeDevTools\u002Fchrome-devtools-mcp) | mcp, browser, devtools | Official MCP server that gives coding agents Chrome DevTools access for reliable browser automation, debugging, and performance analysis. |\n| Playwright MCP | [GitHub](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fplaywright-mcp) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-33510-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fplaywright-mcp) | mcp, browser, playwright | Official Playwright MCP server giving agents structured accessibility snapshots and deterministic browser automation tools. |\n| Claude Code Plugins Directory | [GitHub](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fclaude-plugins-official) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-29382-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fclaude-plugins-official) | plugins, claude-code, marketplace | Anthropic-managed Claude Code plugin marketplace defining plugin manifests, MCP configuration, commands, agents, skills, and submission quality gates. |\n| FastMCP | [GitHub](https:\u002F\u002Fgithub.com\u002FPrefectHQ\u002Ffastmcp) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-25496-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FPrefectHQ\u002Ffastmcp) | mcp, python, framework | Python framework for building MCP servers and clients with generated schemas, validation, documentation, production deployment patterns, and governance hooks. |\n| Serena | [GitHub](https:\u002F\u002Fgithub.com\u002Foraios\u002Fserena) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-24949-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Foraios\u002Fserena) | mcp, coding-agents, semantic-tools | MCP toolkit that gives coding agents IDE-like semantic retrieval, editing, refactoring, debugging, and memory tools. |\n| MCP Python SDK | [GitHub](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Fpython-sdk) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-23241-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Fpython-sdk) | mcp, python, sdk | Official Python implementation of MCP for building clients and servers that expose tools, resources, prompts, protocol lifecycle events, and standard transports. |\n| AGENTS.md | [GitHub](https:\u002F\u002Fgithub.com\u002Fagentsmd\u002Fagents.md) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-21970-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fagentsmd\u002Fagents.md) | spec, agent-file, instructions | Open format for repository-local instructions that coding agents can follow. |\n| Agent Skills Specification | [GitHub](https:\u002F\u002Fgithub.com\u002Fagentskills\u002Fagentskills) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-19959-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fagentskills\u002Fagentskills) | skills, spec, progressive-disclosure | Open specification and documentation for packaging reusable agent capabilities, workflows, scripts, references, and assets behind progressive disclosure. |\n| MCP TypeScript SDK | [GitHub](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Ftypescript-sdk) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-12611-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Ftypescript-sdk) | mcp, typescript, sdk | Official TypeScript MCP SDK with server and client packages, transports, auth helpers, middleware adapters, and runnable examples. |\n| Model Context Protocol | [GitHub](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Fmodelcontextprotocol) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-8321-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Fmodelcontextprotocol) | mcp, protocol, interoperability | Core specification and docs for MCP-based tool and context interoperability. |\n| directories (rules and MCP indexes) | [GitHub](https:\u002F\u002Fgithub.com\u002Fleerob\u002Fdirectories) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3945-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fleerob\u002Fdirectories) | directories, mcp, rules | Curated directories of agent rules and MCP servers for tool discovery. |\n| Atmosphere | [GitHub](https:\u002F\u002Fgithub.com\u002FAtmosphere\u002Fatmosphere) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3774-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FAtmosphere\u002Fatmosphere) | jvm, multi-protocol, governance | JVM runtime for streaming governable AI agents across MCP, A2A, AG-UI, and browser-facing transports. |\n| LangChain MCP Adapters | [GitHub](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flangchain-mcp-adapters) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3553-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flangchain-mcp-adapters) | mcp, adapters, integration | Adapters connecting LangChain components with MCP servers. |\n| SkillHub | [GitHub](https:\u002F\u002Fgithub.com\u002Fiflytek\u002Fskillhub) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3325-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fiflytek\u002Fskillhub) | skills, registry, governance | Self-hosted enterprise agent skill registry with package publishing, versioning, discovery, namespaces, RBAC, reviews, and audit logs. |\n| Microsoft MCP Servers | [GitHub](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmcp) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3268-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmcp) | mcp, enterprise, servers | Microsoft's official MCP server catalog for enterprise data and tools. |\n| ACPX | [GitHub](https:\u002F\u002Fgithub.com\u002Fopenclaw\u002Facpx) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2819-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopenclaw\u002Facpx) | acp, client, sessions | Headless CLI client for stateful Agent Client Protocol sessions. |\n| GitAgentProtocol | [GitHub](https:\u002F\u002Fgithub.com\u002Fopen-gitagent\u002Fgitagent-protocol) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2796-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopen-gitagent\u002Fgitagent-protocol) | standard, git-native, workflows | Git-native, framework-agnostic standard for defining agents, skills, workflows, tools, and runtime memory in repositories. |\n| Microsoft Learn MCP | [GitHub](https:\u002F\u002Fgithub.com\u002FMicrosoftDocs\u002Fmcp) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1682-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FMicrosoftDocs\u002Fmcp) | mcp, docs, grounding | MCP server and CLI for grounding agents with Microsoft documentation sources. |\n| IBM MCP | [GitHub](https:\u002F\u002Fgithub.com\u002FIBM\u002Fmcp) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-381-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FIBM\u002Fmcp) | mcp, clients, tooling | IBM collection of MCP servers, clients, and developer tooling. |\n| AGENT.md | [GitHub](https:\u002F\u002Fgithub.com\u002Fagentmd\u002Fagent.md) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-82-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fagentmd\u002Fagent.md) | standard, agent-file, interoperability | Standardized machine-readable file format for agentic coding tools. |\n\n\u003Ca id=\"evaluation-harnesses-benchmarks\">\u003C\u002Fa>\n### Evaluation Harnesses & Benchmarks\n\n| Project | Link | Stars | Tags | Summary |\n| --- | --- | --- | --- | --- |\n| Promptfoo | [GitHub](https:\u002F\u002Fgithub.com\u002Fpromptfoo\u002Fpromptfoo) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-21898-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fpromptfoo\u002Fpromptfoo) | eval, red-team, ci | Config-driven prompt\u002Fagent\u002FRAG testing, comparison, and red-team evaluation tool. |\n| DeepEval | [GitHub](https:\u002F\u002Fgithub.com\u002Fconfident-ai\u002Fdeepeval) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-15928-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fconfident-ai\u002Fdeepeval) | evaluation, framework, testing | LLM evaluation framework supporting agent and workflow quality testing. |\n| RAGAS | [GitHub](https:\u002F\u002Fgithub.com\u002Fvibrantlabsai\u002Fragas) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-14243-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fvibrantlabsai\u002Fragas) | rag, metrics, evaluation | Open evaluation toolkit for LLM and RAG quality metrics. |\n| lm-evaluation-harness | [GitHub](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-12819-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) | benchmark, harness, llm | Popular benchmark harness for consistent LLM evaluation across tasks. |\n| SWE-bench | [GitHub](https:\u002F\u002Fgithub.com\u002FSWE-bench\u002FSWE-bench) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-5091-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FSWE-bench\u002FSWE-bench) | benchmark, swe, evaluation | Standard benchmark for evaluating issue-fixing software engineering agents. |\n| verifiers | [GitHub](https:\u002F\u002Fgithub.com\u002FPrimeIntellect-ai\u002Fverifiers) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-4163-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FPrimeIntellect-ai\u002Fverifiers) | verifier, rl, evaluation | Library for RL environments and verifier-based evaluation loops. |\n| AgentBench | [GitHub](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentBench) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3469-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentBench) | benchmark, cross-domain, agent | Cross-environment benchmark for evaluating LLM agents as tool-using systems. |\n| LangWatch | [GitHub](https:\u002F\u002Fgithub.com\u002Flangwatch\u002Flangwatch) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3288-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flangwatch\u002Flangwatch) | simulation, evaluation, testing | End-to-end platform for agent simulations, evaluation loops, and production testing. |\n| EvalScope | [GitHub](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fevalscope) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2892-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fevalscope) | benchmark, framework, llm | Customizable framework for large-model benchmarking and performance evaluation. |\n| Terminal-Bench | [GitHub](https:\u002F\u002Fgithub.com\u002Fharbor-framework\u002Fterminal-bench) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2316-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fharbor-framework\u002Fterminal-bench) | terminal, benchmark, long-horizon | Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks. |\n| Harbor | [GitHub](https:\u002F\u002Fgithub.com\u002Fharbor-framework\u002Fharbor) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2304-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fharbor-framework\u002Fharbor) | evaluation, harness, rl-env | Framework for running agent evaluations and constructing RL-style environments. |\n| WebArena | [GitHub](https:\u002F\u002Fgithub.com\u002Fweb-arena-x\u002Fwebarena) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1496-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fweb-arena-x\u002Fwebarena) | web-agent, benchmark, environment | Self-hostable web environment and evaluation harness for autonomous web agents with reproducible end-to-end tasks. |\n| tau2-bench | [GitHub](https:\u002F\u002Fgithub.com\u002Fsierra-research\u002Ftau2-bench) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1292-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fsierra-research\u002Ftau2-bench) | tool-use, interaction, benchmark | Tool-agent-user interaction benchmark emphasizing multi-step execution quality. |\n| Meta-Harness | [GitHub](https:\u002F\u002Fgithub.com\u002Fstanford-iris-lab\u002Fmeta-harness) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1025-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fstanford-iris-lab\u002Fmeta-harness) | harness-search, optimization, terminal-bench | Framework for automated search over task-specific model harnesses, with reference experiments for memory systems and terminal-agent scaffolds. |\n| NeMo Gym | [GitHub](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-960-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym) | rl-env, training, evaluation | Toolkit for building RL environments suitable for LLM\u002Fagent training and eval. |\n| TheAgentCompany | [GitHub](https:\u002F\u002Fgithub.com\u002FTheAgentCompany\u002FTheAgentCompany) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-719-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FTheAgentCompany\u002FTheAgentCompany) | benchmark, workplace, multi-step | Agent benchmark with simulated software-company tasks for evaluating multi-step workplace autonomy. |\n| Claw-Eval | [GitHub](https:\u002F\u002Fgithub.com\u002Fclaw-eval\u002Fclaw-eval) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-642-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fclaw-eval\u002Fclaw-eval) | benchmark, trajectory, safety | Evaluation harness and benchmark for autonomous agents with human-verified tasks, trajectory auditing, and completion, safety, and robustness rubrics. |\n| Inspect Evals | [GitHub](https:\u002F\u002Fgithub.com\u002FUKGovernmentBEIS\u002Finspect_evals) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-525-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FUKGovernmentBEIS\u002Finspect_evals) | inspect, eval-suite, reproducibility | Evaluation suite collection for Inspect AI workflows. |\n| auto-harness | [GitHub](https:\u002F\u002Fgithub.com\u002Fneosigmaai\u002Fauto-harness) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-510-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fneosigmaai\u002Fauto-harness) | optimization, regression, evals | Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight. |\n| WildClawBench | [GitHub](https:\u002F\u002Fgithub.com\u002FInternLM\u002FWildClawBench) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-424-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FInternLM\u002FWildClawBench) | benchmark, harness-comparison, multimodal | In-the-wild benchmark that compares multiple agent harnesses on end-to-end multimodal, coding, safety, and productivity tasks inside a live OpenClaw environment. |\n| SWE-Bench Pro | [GitHub](https:\u002F\u002Fgithub.com\u002Fscaleapi\u002FSWE-bench_Pro-os) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-423-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fscaleapi\u002FSWE-bench_Pro-os) | swe, benchmark, long-horizon | Long-horizon software-engineering benchmark with reproducible Docker-based evaluation for issue-driven coding agents. |\n| Agent Evaluation | [GitHub](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fagent-evaluation) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-364-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fagent-evaluation) | evaluation, testing, ci | AWS framework for testing virtual agents with evaluator-driven multi-turn conversations, hooks, and CI-friendly workflows. |\n| ClawBench | [GitHub](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FClawBench) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-363-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FClawBench) | browser-agent, benchmark, recording | Browser-agent benchmark with live-site tasks, isolated containers, five-layer recording, and agentic scoring. |\n| WorkArena | [GitHub](https:\u002F\u002Fgithub.com\u002FServiceNow\u002FWorkArena) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-252-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FServiceNow\u002FWorkArena) | browser, benchmark, enterprise | Browser benchmark for practical enterprise-like knowledge work tasks. |\n| OpenHands Benchmarks | [GitHub](https:\u002F\u002Fgithub.com\u002FOpenHands\u002Fbenchmarks) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-87-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FOpenHands\u002Fbenchmarks) | openhands, eval, harness | Evaluation harness and benchmark definitions for OpenHands systems. |\n| WebArena-Verified | [GitHub](https:\u002F\u002Fgithub.com\u002FServiceNow\u002Fwebarena-verified) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-39-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FServiceNow\u002Fwebarena-verified) | web-agent, benchmark, deterministic | Verified web-agent benchmark with deterministic evaluators. |\n| HarnessBench | [GitHub](https:\u002F\u002Fgithub.com\u002Freacher-z\u002FHarnessBench) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-12-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Freacher-z\u002FHarnessBench) | harness-comparison, browser-agent, benchmark | Benchmark for comparing agent harnesses on the same everyday web tasks with fixed models and per-harness containers. |\n\n\u003Ca id=\"observability-reliability-operations\">\u003C\u002Fa>\n### Observability & Reliability Operations\n\n| Project | Link | Stars | Tags | Summary |\n| --- | --- | --- | --- | --- |\n| Langfuse | [GitHub](https:\u002F\u002Fgithub.com\u002Flangfuse\u002Flangfuse) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-28520-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flangfuse\u002Flangfuse) | llmops, tracing, metrics | Open-source LLM engineering platform for traces, metrics, prompts, and evals. |\n| MLflow | [GitHub](https:\u002F\u002Fgithub.com\u002Fmlflow\u002Fmlflow) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-26306-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmlflow\u002Fmlflow) | platform, monitoring, evaluation | Broad AI engineering platform with monitoring and evaluation support for agents. |\n| Opik | [GitHub](https:\u002F\u002Fgithub.com\u002Fcomet-ml\u002Fopik) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-19433-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fcomet-ml\u002Fopik) | monitoring, eval, tracing | End-to-end debug\u002Feval\u002Fmonitoring stack for LLM apps and agent workflows. |\n| RagaAI Catalyst | [GitHub](https:\u002F\u002Fgithub.com\u002Fraga-ai-hub\u002FRagaAI-Catalyst) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-16171-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fraga-ai-hub\u002FRagaAI-Catalyst) | agentops, analytics, monitoring | Agent observability and monitoring framework with timeline and graph analytics. |\n| TensorZero | [GitHub](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-11436-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Ftensorzero\u002Ftensorzero) | llmops, gateway, optimization | Open LLMOps stack unifying gateway, observability, evaluation, and optimization. |\n| Arize Phoenix | [GitHub](https:\u002F\u002Fgithub.com\u002FArize-ai\u002Fphoenix) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-9988-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FArize-ai\u002Fphoenix) | observability, tracing, evaluation | Open platform for AI observability, tracing, and evaluation analytics. |\n| OpenLLMetry | [GitHub](https:\u002F\u002Fgithub.com\u002Ftraceloop\u002Fopenllmetry) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-7177-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Ftraceloop\u002Fopenllmetry) | opentelemetry, instrumentation, tracing | OpenTelemetry-based instrumentation for GenAI and LLM applications. |\n| Helicone | [GitHub](https:\u002F\u002Fgithub.com\u002FHelicone\u002Fhelicone) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-5778-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FHelicone\u002Fhelicone) | monitoring, traffic, production | Lightweight platform for monitoring and evaluating LLM traffic in production. |\n| AgentOps SDK | [GitHub](https:\u002F\u002Fgithub.com\u002FAgentOps-AI\u002Fagentops) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-5602-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FAgentOps-AI\u002Fagentops) | agentops, monitoring, cost | Monitoring and benchmarking SDK for agent workflows with cost and trace tracking. |\n| Latitude | [GitHub](https:\u002F\u002Fgithub.com\u002Flatitude-dev\u002Flatitude-llm) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-4053-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flatitude-dev\u002Flatitude-llm) | platform, eval, observability | Open-source agent engineering platform with eval and observability capabilities. |\n| Laminar | [GitHub](https:\u002F\u002Fgithub.com\u002Flmnr-ai\u002Flmnr) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2978-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flmnr-ai\u002Flmnr) | observability, tracing, evals | Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards. |\n| claude-code-reverse | [GitHub](https:\u002F\u002Fgithub.com\u002FYuyz0112\u002Fclaude-code-reverse) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-2377-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FYuyz0112\u002Fclaude-code-reverse) | trace, visualization, debugging | Tooling to visualize and inspect Claude Code LLM interaction traces. |\n| Future AGI | [GitHub](https:\u002F\u002Fgithub.com\u002Ffuture-agi\u002Ffuture-agi) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1101-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Ffuture-agi\u002Ffuture-agi) | observability, evaluation, guardrails | Self-hostable platform that closes the loop across agent tracing, evaluation, simulation, guardrails, and gateway operations. |\n| OpenInference | [GitHub](https:\u002F\u002Fgithub.com\u002FArize-ai\u002Fopeninference) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1007-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FArize-ai\u002Fopeninference) | spec, instrumentation, observability | Open instrumentation specification and tooling for AI observability. |\n\n\u003Ca id=\"guardrails-security-governance\">\u003C\u002Fa>\n### Guardrails, Security & Governance\n\n| Project | Link | Stars | Tags | Summary |\n| --- | --- | --- | --- | --- |\n| LiteLLM | [GitHub](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-49353-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FBerriAI\u002Flitellm) | gateway, proxy, guardrails | Unified LLM gateway\u002Fproxy with cost tracking, load balancing, and guardrails. |\n| Kong | [GitHub](https:\u002F\u002Fgithub.com\u002FKong\u002Fkong) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-43520-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FKong\u002Fkong) | gateway, policy, infra | API and AI gateway infrastructure useful for policy enforcement in agent systems. |\n| Parlant | [GitHub](https:\u002F\u002Fgithub.com\u002Femcie-co\u002Fparlant) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-18100-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Femcie-co\u002Fparlant) | interaction-control, guardrails, customer-agents | Interaction-control harness for customer-facing agents focused on consistent, predictable, and governed LLM behavior. |\n| Portkey Gateway | [GitHub](https:\u002F\u002Fgithub.com\u002FPortkey-AI\u002Fgateway) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-11973-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FPortkey-AI\u002Fgateway) | gateway, guardrails, routing | AI gateway with routing and guardrails for multi-model production traffic. |\n| CAI (Cybersecurity AI) | [GitHub](https:\u002F\u002Fgithub.com\u002Faliasrobotics\u002Fcai) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-8852-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Faliasrobotics\u002Fcai) | security, governance, framework | Security-focused agent framework for offensive\u002Fdefensive AI workflows. |\n| OpenAI Realtime Agents | [GitHub](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-realtime-agents) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-6888-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-realtime-agents) | realtime, orchestration, control | Advanced agentic realtime patterns with structured control and interaction loops. |\n| Plano | [GitHub](https:\u002F\u002Fgithub.com\u002Fkatanemo\u002Fplano) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-6571-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fkatanemo\u002Fplano) | proxy, safety, data-plane | AI-native proxy and data plane with orchestration, safety, and observability. |\n| OpenAI CS Agents Demo | [GitHub](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-cs-agents-demo) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-6386-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-cs-agents-demo) | demo, handoffs, governance | Customer-service multi-agent demo highlighting handoffs and guardrail-like control points. |\n| Agent Governance Toolkit | [GitHub](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fagent-governance-toolkit) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3995-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fagent-governance-toolkit) | governance, policy, sandboxing | Runtime governance toolkit that deterministically enforces agent policy, identity, sandboxing, and audit controls before actions execute. |\n| ContextForge | [GitHub](https:\u002F\u002Fgithub.com\u002FIBM\u002Fmcp-context-forge) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3823-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FIBM\u002Fmcp-context-forge) | gateway, governance, observability | Registry and proxy layer that unifies MCP, A2A, and REST\u002FgRPC endpoints with centralized governance and observability. |\n| Archestra | [GitHub](https:\u002F\u002Fgithub.com\u002Farchestra-ai\u002Farchestra) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3800-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Farchestra-ai\u002Farchestra) | enterprise, guardrails, governance | Enterprise AI platform with guardrails, MCP registry, and orchestration services. |\n| Tracecat | [GitHub](https:\u002F\u002Fgithub.com\u002FTracecatHQ\u002Ftracecat) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3629-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FTracecatHQ\u002Ftracecat) | security, automation, policy | AI automation platform for security teams with policy and workflow controls. |\n| AgentGateway | [GitHub](https:\u002F\u002Fgithub.com\u002Fagentgateway\u002Fagentgateway) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-3052-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fagentgateway\u002Fagentgateway) | gateway, mcp, proxy | Agentic proxy gateway for AI agents and MCP server ecosystems. |\n| ClawManager | [GitHub](https:\u002F\u002Fgithub.com\u002FYuan-lab-LLM\u002FClawManager) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1693-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FYuan-lab-LLM\u002FClawManager) | control-plane, governance, runtimes | Kubernetes-native control plane for governing agent runtimes, AI gateway access, and reusable skills across multiple agent backends. |\n| Agent Vault | [GitHub](https:\u002F\u002Fgithub.com\u002FInfisical\u002Fagent-vault) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1589-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FInfisical\u002Fagent-vault) | credentials, egress-policy, proxy | Credential proxy and vault that brokers agent API access without exposing real secrets, with egress filtering and request logging. |\n| Haft | [GitHub](https:\u002F\u002Fgithub.com\u002Fm0n0x41d\u002Fhaft) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-1337-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fm0n0x41d\u002Fhaft) | governance, decisions, mcp | Decision-governance harness that records falsifiable contracts, evidence, and commissions before agents execute. |\n| Sponsio | [GitHub](https:\u002F\u002Fgithub.com\u002FSponsioLabs\u002FSponsio) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-472-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FSponsioLabs\u002FSponsio) | contracts, runtime-safety, guardrails | Runtime enforcement layer that checks every agent action against deterministic contracts before execution. |\n| DashClaw | [GitHub](https:\u002F\u002Fgithub.com\u002Fucsandman\u002FDashClaw) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-273-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fucsandman\u002FDashClaw) | approvals, policy, audit | Governance layer that intercepts risky agent actions, enforces policy, routes approvals, and records audit-ready decision trails. |\n| Tandem | [GitHub](https:\u002F\u002Fgithub.com\u002Ffrumu-ai\u002Ftandem) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-106-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Ffrumu-ai\u002Ftandem) | runtime-authority, approvals, audit | Governed runtime authority layer for agents with scoped execution, tool visibility, permissioned memory, approval gates, and audit trails. |\n\n\u003Ca id=\"reference-harness-implementations\">\u003C\u002Fa>\n### Reference Harness Implementations\n\n| Project | Link | Stars | Tags | Summary |\n| --- | --- | --- | --- | --- |\n| OpenClaw | [GitHub](https:\u002F\u002Fgithub.com\u002Fopenclaw\u002Fopenclaw) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-376933-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopenclaw\u002Fopenclaw) | gateway, channels, sandboxing | Local-first personal assistant harness with a gateway control plane for sessions, channels, tools, events, skills, and sandboxed non-main agents. |\n| Claw Code | [GitHub](https:\u002F\u002Fgithub.com\u002Fultraworkers\u002Fclaw-code) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-193292-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fultraworkers\u002Fclaw-code) | rust, cli, sessions | Public Rust implementation of the claw CLI agent harness with auth, sessions, parity checks, container workflows, and terminal execution guidance. |\n| Hermes Agent | [GitHub](https:\u002F\u002Fgithub.com\u002FNousResearch\u002Fhermes-agent) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-181436-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FNousResearch\u002Fhermes-agent) | memory, skills, subagents | Self-improving agent runtime with memory, skill creation, subagents, scheduled automations, and pluggable terminal backends. |\n| OpenCode | [GitHub](https:\u002F\u002Fgithub.com\u002Fanomalyco\u002Fopencode) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-170120-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fanomalyco\u002Fopencode) | terminal, coding-agent, subagents | Open-source coding agent with built-in plan\u002Fbuild roles, subagents, LSP support, and a client-server runtime. |\n| Claude Code | [GitHub](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fclaude-code) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-130255-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fclaude-code) | terminal, coding-agent, git-workflows | Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language. |\n| Gemini CLI | [GitHub](https:\u002F\u002Fgithub.com\u002Fgoogle-gemini\u002Fgemini-cli) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-104960-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fgoogle-gemini\u002Fgemini-cli) | terminal, coding-agent, mcp | Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls. |\n| Browser Use | [GitHub](https:\u002F\u002Fgithub.com\u002Fbrowser-use\u002Fbrowser-use) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-97250-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fbrowser-use\u002Fbrowser-use) | browser-agent, automation, benchmarks | Browser-agent framework that exposes websites to LLMs through browser state, tools, cloud browsers, and benchmarked task runs. |\n| Codex CLI | [GitHub](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fcodex) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-88779-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fcodex) | terminal, coding-agent, local-execution | Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks. |\n| LobeHub | [GitHub](https:\u002F\u002Fgithub.com\u002Flobehub\u002Flobehub) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-78211-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Flobehub\u002Flobehub) | operator, multi-agent, scheduling | Chief-agent-operator platform for scheduling, running, and reporting on multi-agent workstreams. |\n| OpenHands | [GitHub](https:\u002F\u002Fgithub.com\u002FOpenHands\u002FOpenHands) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-75874-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FOpenHands\u002FOpenHands) | coding-agent, software-engineering, repo | Open-source AI software engineer focused on repo-level coding task execution. |\n| Paperclip | [GitHub](https:\u002F\u002Fgithub.com\u002Fpaperclipai\u002Fpaperclip) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-69141-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fpaperclipai\u002Fpaperclip) | managed-agents, control-plane, governance | Managed-agent control plane with org charts, ticketing, budgets, heartbeats, and audit trails for coordinating agent teams. |\n| learn-claude-code | [GitHub](https:\u002F\u002Fgithub.com\u002FshareAI-lab\u002Flearn-claude-code) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-64764-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002FshareAI-lab\u002Flearn-claude-code) | tutorial, harness, claude-code | Hands-on harness tutorial for building Claude Code-like systems from scratch. |\n| Cline | [GitHub](https:\u002F\u002Fgithub.com\u002Fcline\u002Fcline) | [![star](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstar-62772-f4b400?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fcline\u002Fcline) | coding-agent, mcp, checkpoints | Open-source coding agent spanning IDE, terminal, SDK, and kanban surfaces with shared approvals, MCP, checkpoints, and agent teams. |\n| pi | [GitHub](https:\u002F\u002Fgithub.com\u002Fearendil-wo",2,"2026-06-11 04:11:24","high_star"]