[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-84089":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":16,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":22,"readmeContent":23,"aiSummary":10,"trendingCount":15,"starSnapshotCount":15,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},84089,"argus","quarqlabs\u002Fargus","quarqlabs","A recursive evidence-gated cognitive runtime for memory-native AI agents, combining hybrid retrieval, temporal reasoning, async learning, and plug-and-play tools.","",null,"Python",250,25,248,0,1,4.24,"Apache License 2.0",false,"main",[],"2026-06-12 02:04:37","# Argus Agent\n\n**Local memory. Hybrid retrieval. Self-correcting reasoning. Benchmark-grade recall.**\n\nArgus Agent is a memory-first AI agent for long-context personal intelligence, grounded recall, temporal reasoning, quantitative reasoning, and tool use.\n\nIt is designed as an open, inspectable alternative to memory agents such as Hermes or OpenClaw, with a stronger emphasis on durable local memory, strict attribution, self-correcting retrieval, and benchmark-grade long-term recall.\n\nThe current local implementation keeps normal semantic, episodic, and procedural learning in `agent.py`. Deterministic structured-artifact extractor code exists in the repo, but it is disabled in the active learning path while benchmark memory quality is being tuned.\n\nLocal LongMemEval-S reports are checkpoints while learning and generation behavior is being validated. Treat checked-in report files as local progress snapshots, not final published benchmark numbers.\n\nBenchmark cost warning: a full 500-question LongMemEval-S run with the current model mix has cost about `$2,500` in practice, or about `$5` per average question. Run a 1-question or small-sample benchmark first before starting the full dataset.\n\n## Contents\n\n- [Why Argus Exists](#why-argus-exists)\n- [What's New In v0.5.0](#whats-new-in-v050)\n- [What Makes It Different](#what-makes-it-different)\n- [Highlights](#highlights)\n- [Architecture](#architecture)\n- [Memory System](#memory-system)\n- [Structured Artifact Extractors](#structured-artifact-extractors)\n- [Local Storage Layout](#local-storage-layout)\n- [Retrieval Pipeline](#retrieval-pipeline)\n- [Temporal Reasoning](#temporal-reasoning)\n- [Quantitative Reasoning](#quantitative-reasoning)\n- [Self-Correcting Retrieval](#self-correcting-retrieval)\n- [Learning Pipeline](#learning-pipeline)\n- [Tool System](#tool-system)\n- [Coding Agent Delegation](#coding-agent-delegation)\n- [Benchmarks](#benchmarks)\n- [Benchmark Cost Planning](#benchmark-cost-planning)\n- [Current Local Metrics](#current-local-metrics)\n- [Requirements](#requirements)\n- [Quick Start](#quick-start)\n- [Control Console](#control-console)\n- [Agent Identity Config](#agent-identity-config)\n- [Channel Integrations](#channel-integrations)\n- [API Job Queue](#api-job-queue)\n- [Environment Variables](#environment-variables)\n- [Repository Map](#repository-map)\n- [Design Principles](#design-principles)\n- [Status](#status)\n- [License](#license)\n\n## Why Argus Exists\n\nMost agents can chat. Fewer can remember. Almost none can remember carefully.\n\nArgus Agent is built around a simple idea: memory is not just vector search. A serious memory agent needs to know what a memory means, when it happened, what numbers belong to, which entity a fact is attached to, when evidence is incomplete, and when it must search again instead of guessing.\n\nArgus combines:\n\n- local FAISS vector memory\n- semantic, episodic, and procedural memory separation\n- hybrid vector plus keyword retrieval\n- HyDE-style query expansion\n- dynamic recall depth\n- strict temporal grounding\n- numeric attribution and exact aggregation rules\n- structured artifact extractor code for table rows, lists, blocks, quotes, budgets, timelines, metrics, ratios, and other evidence-shaped outputs, currently disabled in the active learning path\n- self-correcting fallback retrieval\n- background memory consolidation\n- LangGraph orchestration\n- progressive tool routing\n\nThe result is an agent that behaves less like a stateless chatbot and more like a disciplined cognitive system.\n\n## What's New In v0.5.0\n\nThis release turns Argus into a much more complete local agent runtime:\n\n- **Triage-first normal runtime:** normal chat now starts with a fast LLM triage layer that chooses direct response, tool routing, memory retrieval, or retrieval-before-tools. Information-only updates and questions already answerable from recent chat can skip foreground retrieval entirely.\n- **Benchmark path preserved:** benchmark traffic still uses the original retrieval, tool-routing, generation, and learning prompts\u002Fflow so memory benchmark behavior remains comparable.\n- **Learning-safe optional retrieval:** when foreground retrieval is skipped, background learning still performs a learning-related retrieval pass before memory editing so `UPDATE` and `DELETE` actions have prior memory IDs\u002Fcontext.\n- **Clearer metrics:** triage, direct response, foreground retrieval, tool routing, generation, and learning-related retrieval are timed separately, and terminal background metrics avoid overwriting the active input prompt.\n- **Argus control console:** a Codex-style CLI with a fixed bottom input row, scrollable transcript, Markdown rendering, command palette, multiline compose, live status header, global `argus` launcher, and one-command setup scripts for macOS\u002FLinux and Windows.\n- **API job queue:** chat requests now create jobs, emit status events while work is happening, and return final responses when the job completes. The CLI polls events instead of blocking silently.\n- **On-demand channels:** Telegram connects only when requested with `\u002Fconnect telegram`, supports startup defaults, shows typing indicators, retries registration cleanly, and can receive coding-task progress after connecting mid-run.\n- **Durable channel context:** CLI and Telegram chat history are stored locally, command responses are saved into history, and only the latest four user\u002Fassistant pairs are passed into each agent request.\n- **Multimodal input storage:** incoming Telegram\u002FAPI files are stored under local channel state, indexed, and passed into the agent with best-effort text extraction or AI-assisted image\u002Faudio\u002FPDF understanding.\n- **Local identity management:** agent name, personality, use cases, and custom prompt can be updated through a local identity config file instead of Supabase-backed identity tools.\n- **Cloud-tool expansion:** external SaaS actions are routed through a single cloud-tool skill with configurable toolkits, user-facing `\u002Ftools`, `\u002Fwhich-tool`, `\u002Fcloud-tools`, `\u002Fadd-tool`, and `\u002Fremove-tool` commands.\n- **Coding-agent delegation:** Argus can delegate software work to Codex, persist tasks and logs, continue or start fresh sessions, delete task history, configure workspace\u002Fprovider\u002Fnetwork mode, and stream progress into CLI and subscribed Telegram chats.\n- **Coding safety and UX:** coding tasks use shallow retrieval, portable workspace defaults, task-id suggestions, network-on defaults for package registries, safe restart when Codex sandbox settings change, and progress-file polling for long training\u002Fbuild tasks.\n- **Cloudflared setup fixes:** setup can install `cloudflared` with Homebrew on macOS\u002FLinux, `winget` on Windows, or a direct PowerShell download fallback to `C:\\cloudflared\\cloudflared.exe`.\n\n## What Makes It Different\n\nArgus is not a wrapper around a vector database. It is a full memory reasoning loop.\n\nStandard RAG systems usually fail long-memory tasks for one of four reasons:\n\n1. They retrieve the wrong memory.\n2. They retrieve the right memory but attach it to the wrong entity.\n3. They confuse storage time with event time.\n4. They calculate with nearby numbers that do not belong to the question.\n\nArgus directly attacks those failure modes with retrieval decomposition, evidence attribution, temporal guardrails, numeric scope checks, and a second-pass recovery path when the first context is incomplete.\n\n## Highlights\n\n- Local-first memory: no Supabase pgvector dependency. Memories and rules are saved under `local_memory\u002F\u003CAGENT_ID>\u002F`.\n- Three memory types: semantic facts, episodic events, and procedural behavioral rules.\n- FAISS-backed retrieval: normalized OpenAI embeddings with `IndexFlatIP` cosine-style similarity.\n- Hybrid search: every retrieval pass combines vector search and direct keyword matching.\n- HyDE query optimizer: rewrites the user prompt into multiple retrieval probes before search.\n- Dynamic thresholds: wide-net `deep` mode for aggregation, timelines, broad categories, and recommendations; stricter `standard` mode for point facts.\n- Required-data fallback: the model can request a targeted second retrieval pass when evidence is missing.\n- Temporal truth protocol: separates database storage time from narrative event time.\n- Quantitative fidelity: numbers are stored and used with owner, property, item, and exactness.\n- Benchmark ingestion learning: history chunks are split into individual user\u002Fassistant pairs, learned sequentially, staged in RAM between pairs, and committed after the chunk is complete.\n- Duplicate protection: batch writes skip exact duplicate content before embedding, then use the normal vector duplicate check to avoid repeated memories.\n- Triage-first normal chat: information-only updates, casual chat, and questions answerable from current chat can skip foreground retrieval.\n- Retrieval-before-tools: when an action genuinely needs stored memory before a tool can run, triage routes retrieval first and then tool selection.\n- Background learning: normal user responses return immediately while memory extraction runs asynchronously.\n- Learning-related retrieval: optional foreground retrieval does not weaken memory editing, because background learning can retrieve prior context before issuing `ADD`, `UPDATE`, or `DELETE`.\n- Benchmark ingestion synchronization: benchmark memory-ingestion turns learn synchronously before returning, guarded by an ingestion lock.\n- Progressive tool loading: tool docs are only injected when a skill is selected.\n- Benchmark mode: disables tool routing, synchronously learns memory-ingestion chunks, and waits for any pending learning before final evaluation.\n- Local control console: starts the FastAPI worker, shows structured request\u002Fjob\u002Fchannel events, supports multiline input, command completion, and a scrollable transcript.\n- On-demand channel connections: channels are connected only when requested, starting with Telegram through a temporary Cloudflare tunnel and automatic webhook registration.\n- Local identity config: agent name, personality, use cases, and custom directives can be updated by tool call into a local JSON file instead of Supabase.\n- Coding-agent delegation: Argus can start durable Codex tasks, stream coding progress into the CLI, let the user reply, and persist task logs locally.\n\n## Architecture\n\n```text\nUser \u002F API\n    |\n    v\nLangGraph StateGraph\n    |\n    +-- triage_request  (normal channels)\n    |     |\n    |     +-- direct_response\n    |     +-- route_tools\n    |     +-- retrieve_memories\n    |     +-- retrieve_memories -> route_tools\n    |\n    +-- retrieve_memories  (benchmark channel always starts here)\n    |     |\n    |     +-- HyDE query generation\n    |     +-- semantic FAISS search\n    |     +-- episodic FAISS search\n    |     +-- keyword search\n    |     +-- procedural rule routing\n    |\n    +-- route_tools\n    |     |\n    |     +-- skill catalog router\n    |     +-- progressive skill markdown loading\n    |\n    +-- generate_response\n          |\n          +-- grounded answer synthesis\n          +-- optional ReAct tool loop\n          +-- REQUIRED_DATA fallback retrieval\n          +-- memory learning\n                +-- normal chat: background async\n                +-- benchmark ingestion: inline sync\n```\n\nThe normal interactive graph is triage-first:\n\n```python\nSTART -> triage_request -> direct_response -> END\nSTART -> triage_request -> route_tools -> generate_response -> END\nSTART -> triage_request -> retrieve_memories -> generate_response -> END\nSTART -> triage_request -> retrieve_memories -> route_tools -> generate_response -> END\n```\n\nBenchmark traffic intentionally keeps the original high-recall route:\n\n```python\nSTART -> retrieve_memories -> route_tools -> generate_response -> END\n```\n\nLearning is launched from `generate_response` or `direct_response`. Normal chat keeps the interactive path fast by learning in the background. If normal chat skipped foreground retrieval, background learning performs a learning-related retrieval pass before memory editing so updates and deletes still see prior memory context. Benchmark memory-ingestion prompts learn inline before the response returns so the next history chunk retrieves against the latest committed memories.\n\n## Memory System\n\nArgus uses three memory layers.\n\n### Semantic Memory\n\nSemantic memory stores durable user facts:\n\n- identity\n- preferences\n- relationships\n- routines\n- long-term projects\n- possessions\n- stable traits\n- active statuses and inventories\n\nExample:\n\n```text\nUser owns a crystal chandelier that originally belonged to their great-grandmother and was given to them by their aunt.\n```\n\n### Episodic Memory\n\nEpisodic memory stores events and interaction history:\n\n- what happened\n- when it happened\n- who was involved\n- what was decided\n- what the user asked for\n- what changed\n\nExample:\n\n```text\nOn March 4, 2023, user received a crystal chandelier from their aunt that originally belonged to their great-grandmother.\n```\n\n### Procedural Memory\n\nProcedural memory stores behavioral rules:\n\n- tone preferences\n- formatting preferences\n- project-specific instructions\n- forbidden wording\n- content generation constraints\n\nProcedural rules are tagged and routed, so the model sees only the relevant rules for the current prompt instead of carrying every rule forever.\n\n## Structured Artifact Extractors\n\nThe local agent keeps the original normal-text learning path intact. Full user and assistant turns are still passed to the learning model, so ordinary narrative details, decisions, preferences, and summaries can become semantic, episodic, or procedural memories.\n\nStructured extractor code is present for artifact-shaped content that summarization can otherwise compress too aggressively. In the current active runtime, these extractors are disabled and their outputs are not injected into the learning prompt or appended to episodic memory. This keeps benchmark learning focused on the model-generated memory actions from the actual user\u002Fassistant pair.\n\nWhen enabled experimentally, the extractor layer can create deterministic episodic units for high-signal data such as:\n\n- markdown table rows\n- explicit artifact blocks such as `::title:: == description`\n- numbered sections for objectives, parameters, methods, options, steps, recommendations, and similar headings\n- recommendation, remedy, dish, shop, restaurant, and product list items\n- ingredient and material items\n- budget, cost, allocation, and campaign plan rows\n- timeline and dated event clauses\n- implementation or \"uses algorithm\u002Ftool\" relationships\n- attributed quotations and exact source claims\n- metric, percentage, improvement, and score relationships\n- ratios, dilutions, and mixture instructions\n- music sections, chord\u002Fnote style rows, and chess move notation\n- counted entity headings such as encounter counts, party sizes, item totals, or named grouped entities\n\nThe extractor layer is intentionally capped and high-signal. It is not meant to memorize every sentence. Its job is to preserve compact data-bearing rows and items that future recall questions often target verbatim.\n\nIn the current runtime, `STRUCTURED ARTIFACT UNITS` are not passed to the learning prompt. Vector writes still pass through the normal local action execution path, including exact duplicate blocking, batch embedding, and vector duplicate checks.\n\n## Local Storage Layout\n\nThe current runtime stores memory locally:\n\n```text\nlocal_memory\u002F\n  \u003CAGENT_ID>\u002F\n    semantic_memory\u002F\n      index.faiss\n      memories.json\n    episodic_memory\u002F\n      index.faiss\n      memories.json\n    procedural_memory\u002F\n      rules.json\n    channel_state\u002F\n      chat_history.json\n      attachments_index.json\n      attachments\u002F\n    coding_agents\u002F\n      tasks.json\n      logs\u002F\n        \u003Ctask_id>.jsonl\n```\n\n`AGENT_ID` determines which memory folder is used. Reusing the same `AGENT_ID` reuses the same memory. Changing `AGENT_ID` gives you a clean isolated agent profile.\n\nChannel state is also durable. CLI, Telegram, and future channels store full\nconversation history in `channel_state\u002Fchat_history.json`, while each agent\nrequest receives only the last eight messages (four user\u002Fassistant pairs) as\nshort-term chat context. Command responses such as `\u002Fcloud-tools` are saved\nthere too, so follow-ups can refer to them.\n\nIncoming files are saved under `channel_state\u002Fattachments\u002F` and indexed in\n`attachments_index.json`. Text, Markdown, JSON, CSV, PDFs, DOCX files, images,\nand audio get best-effort local extraction or AI-assisted description\u002Ftranscript\nwhen supported. The stored file remains available even when readable text cannot\nyet be extracted.\n\nPDF extraction uses `pypdf` for embedded text. If a PDF has no embedded text or\nbehaves like a scanned\u002Fimage document, Argus can render the first few pages with\n`PyMuPDF` and send them through the configured multimodal image model for\nvision OCR. Make sure `pip install -r requirements.txt` has been run in the same\nvirtual environment that starts `main.py` or `agent_cli.py`.\n\nCoding-agent task state is durable too. Each delegated coding task is stored in\n`coding_agents\u002Ftasks.json`, and every progress\u002Fupdate\u002Fapproval\u002Fcompletion event\nis appended to that task's JSONL log under `coding_agents\u002Flogs\u002F`.\n\nEach semantic and episodic memory record includes:\n\n- UUID\n- agent ID\n- memory type\n- content\n- embedding\n- created timestamp\n- updated timestamp\n\nThe formatted retrieval output intentionally preserves this shape:\n\n```text\n[STORED_AT: 2026-05-25 14:00:00] [ID: \u003Cuuid>] \u003Cmemory content>\n```\n\nDownstream deduplication, recency sorting, contradiction handling, and memory update logic all depend on that stable format.\n\n## Retrieval Pipeline\n\nArgus does not simply embed the latest user prompt and hope for the best.\n\nFor normal chat, retrieval is now optional on the foreground path. A fast triage LLM first decides whether the latest turn is:\n\n- `direct`: answer or acknowledge from the current message and recent chat history\n- `tool`: route to the tool system without memory retrieval\n- `retrieval`: retrieve memory before generation\n- `retrieval + tool`: retrieve memory first, then route tools with that memory context\n\nThe triage layer is semantic rather than regex-based. It treats information-only updates as direct responses, because newly supplied facts do not need old memory to be acknowledged. It chooses retrieval when the user is asking for an answer that depends on stored personal memory not already present in recent chat. It chooses tools when the user asks Argus to perform an external action or delegate work. For action requests, tool-only is the default unless a concrete required action input is missing and stored memory is the likely source.\n\nBenchmark traffic bypasses triage and still runs the original retrieval-first path.\n\nThe retrieval node first asks a lightweight model to produce a structured search plan:\n\n```json\n{\n  \"vector_queries\": [\n    \"User total driving duration and travel history\",\n    \"User vehicle, road trip, transit records\",\n    \"User travel milestones, driving time calculation\",\n    \"hours, road trip, destinations\"\n  ],\n  \"keywords\": \"driving, hours, total\",\n  \"search_mode\": \"deep\"\n}\n```\n\nThen it performs:\n\n1. Semantic vector search\n2. Episodic vector search\n3. Semantic keyword search\n4. Episodic keyword search\n5. ID-based deduplication\n6. recency sorting\n7. procedural rule routing\n\nSearch modes:\n\n- `standard`: strict retrieval for point facts, threshold `0.38`\n- `deep`: wide recall for totals, timelines, histories, recommendations, and broad categories, threshold `0.28`\n\nThis is why Argus can answer questions that require multiple memories rather than only nearest-neighbor recall.\n\n## Temporal Reasoning\n\nArgus treats time as evidence, not decoration.\n\nThe agent distinguishes:\n\n- storage timestamp: when the memory was saved\n- narrative date: when the event actually happened\n- benchmark current date: the simulated date for an evaluation question\n- relative dates: phrases like \"yesterday\", \"today\", \"last month\"\n\nThe Temporal Truth Protocol prevents common long-memory errors:\n\n- using database timestamps as event dates\n- borrowing dates from nearby but unrelated memories\n- assuming a discussion date is the same as an event date\n- calculating date gaps from guessed anchors\n\nFor \"how long ago\" questions, Argus searches for the named event first and only uses the current date as the calculation anchor after retrieval.\n\n## Quantitative Reasoning\n\nLong-memory benchmarks often punish sloppy number handling. Argus's numeric protocol is built to avoid that.\n\nFor totals, counts, durations, prices, quantities, or money questions, the model must identify:\n\n- actor or entity\n- measured action or property\n- event or item\n- exactness\n\nIt excludes numbers that are merely nearby or topically related.\n\nExample:\n\n```text\nUser helped organize a concert, which raised over $5,000.\n```\n\nThe amount belongs to the concert, not automatically to the user. It is also a lower-bound value, not an exact addend.\n\nFor exact totals, Argus sums only exact unqualified values unless the user explicitly asks for a minimum, estimate, or range.\n\n## Self-Correcting Retrieval\n\nIf the first retrieval pass does not contain enough evidence, Argus can emit:\n\n```json\n{\n  \"agent_response\": \"\",\n  \"flags\": [\"REQUIRED_DATA\"],\n  \"hyde_queries\": [\"aunt meetup\", \"received chandelier\", \"chandelier handoff\"]\n}\n```\n\nThe runtime then performs a targeted fallback search and regenerates the answer with expanded context.\n\nThe fallback pass has a strict final verification rule: if the exact target is still missing, the agent must say the information is not available instead of guessing.\n\nThis makes the agent aggressive about recall but conservative about truth.\n\n## Learning Pipeline\n\nAfter every normal response, Argus starts background learning.\n\nBecause foreground retrieval can now be skipped, background learning has its own safety step. If the user-facing path did not retrieve memory, the learning worker runs a retrieval pass for the actual user prompt before calling the memory editor. That retrieved context is used only for learning, so memory updates and deletes still have the prior memory lines and IDs they need without forcing every user-facing response through retrieval.\n\nBenchmark memory-ingestion prompts are the exception. They are learned synchronously before the ingestion response returns, so `run_dataset_evals.py` does not feed the next history chunk until the previous chunk has been learned and committed.\n\nThe learning model extracts:\n\n- semantic memories\n- episodic memories\n- procedural rules\n\nIt can issue:\n\n```json\n{\n  \"actions\": [\n    {\"action\": \"ADD\", \"content\": \"New memory\"},\n    {\"action\": \"UPDATE\", \"id\": \"uuid\", \"content\": \"Updated memory\"},\n    {\"action\": \"DELETE\", \"id\": \"uuid\"}\n  ]\n}\n```\n\nImportant learning behaviors:\n\n- preserves specific names and proper nouns\n- resolves relative dates using the current date\n- anchors transfer and acquisition events\n- preserves every number and qualifier\n- avoids duplicate memories across semantic and episodic layers\n- prefers exact values over approximate or bounded restatements\n- updates existing records instead of creating conflicting duplicates\n- keeps normal prose learning active\n- splits benchmark ingestion chunks into individual user\u002Fassistant pairs\n- stages semantic and episodic `ADD`, `UPDATE`, and `DELETE` actions in RAM between pairs\n- commits staged semantic and episodic vector actions after all pairs in the chunk have been processed\n- reloads procedural context between ingestion pairs when procedural rules change\n\nBackground learning is protected by:\n\n- learning-related retrieval before memory editing when foreground retrieval was skipped\n- persistent retry loop\n- exponential backoff\n- concurrency limit of 4 learning tasks for normal background learning\n- benchmark ingestion lock for synchronous memory-ingestion learning\n- benchmark synchronization before final questions\n\n## Tool System\n\nArgus includes a progressive skill router.\n\nInstead of injecting all tool instructions into every prompt, the router sees a compact catalog and selects only the relevant skills. The generation model then receives the full markdown and bound tools for those selected skills.\n\nCurrent included skills:\n\n- agent identity management\n- cloud app actions\n- coding-agent delegation\n\nTool execution uses a ReAct loop with a maximum of 5 iterations. If the loop reaches the limit, the model is forced to stop calling tools and produce a final text response.\n\nCloud tools are the external-action layer for app and SaaS integrations such as GitHub, Gmail, Google Calendar, Slack, Notion, and Linear. Argus keeps its own local identity tool native, while external app auth, tool search, and execution flow through the cloud-tool session.\n\nUsers can inspect and expand the enabled cloud toolkit at runtime:\n\n```text\n\u002Ftools\n\u002Fwhich-tool check my unread emails\n\u002Fcloud-tools\n\u002Fadd-tool gmail\n\u002Fremove-tool slack\n```\n\nEnabled cloud tools are stored in `local_memory\u002F\u003CAGENT_ID>\u002Fagent_tools.json`, with `.env` values used as startup defaults.\n`\u002Fcloud-tools` fetches the available cloud-tool catalog when credentials are\nconfigured, then falls back to the small local catalog if the remote catalog is\nunavailable.\n\nNative coding-agent commands:\n\n```text\n\u002Fcoding implement the failing auth test\n\u002Fcoding-new start a fresh implementation pass for the dashboard\n\u002Fcoding-continue now add tests for the same change\n\u002Fcoding-continue \u003Ctask_id> use this exact older session\n\u002Fcoding or \u002Fcoding-tasks\n\u002Fcoding-agents\n\u002Fcoding-use codex\n\u002Fcoding-workspace \u002Fabsolute\u002Fpath\u002Fto\u002Frepo\n\u002Fcoding-network on\n\u002Fcoding-network off\n\u002Fcoding-network \u003Ctask_id> on\n\u002Fcoding-allow-network \u003Ctask_id>\n\u002Fcoding-status \u003Ctask_id>\n\u002Fcoding-log \u003Ctask_id>\n\u002Fcoding-reply \u003Ctask_id> approved, continue\n\u002Fcoding-cancel \u003Ctask_id>\n\u002Fcoding-delete \u003Ctask_id>\n\u002Fcoding-clear\n```\n\nThe coding-agent skill is provider-neutral. V1 uses Codex; future providers can\nplug into the same task store, event feed, and API command surface.\nThe current default provider, workspace, and network mode are visible in the CLI\nheader.\n\n## Coding Agent Delegation\n\nArgus delegates coding work instead of editing files directly in the chat flow.\nThe first provider is Codex, launched through Codex CLI's MCP server with the\ndefault command:\n\n```bash\ncodex mcp-server\n```\n\nDelegated tasks are durable:\n\n- `tasks.json` stores task status, provider, workspace, network mode, prompt, changed files, errors, and summary.\n- `logs\u002F\u003Ctask_id>.jsonl` stores every progress event.\n- `.argus\u002Fcoding_progress\u002F\u003Ctask_id>.jsonl` inside the coding workspace is a provider-side progress file Argus asks Codex to update during long tasks. Argus polls it and streams those progress rows to the CLI and subscribed channels.\n- `config.json` stores local overrides for default provider, workspace, and coding-network default when changed from CLI\u002FAPI\u002Ftool calls.\n- The CLI subscribes to `\u002Fapi\u002Fevents` and renders coding events in the transcript.\n- The user can reply to a waiting task with `\u002Fcoding-reply \u003Ctask_id> \u003Cmessage>`.\n- The user can continue the most recent completed Codex session with `\u002Fcoding-continue \u003Cmessage>`.\n- The user can start a deliberately fresh Codex session with `\u002Fcoding-new \u003Ctask>`.\n- The user can delete a completed\u002Ffailed\u002Fcancelled task with `\u002Fcoding-delete \u003Ctask_id>`, or clear all non-active task history with `\u002Fcoding-clear`.\n\nProvider\u002Fworkspace control:\n\n- `.env` provides startup defaults such as `CODING_AGENT_DEFAULT_PROVIDER` and `CODEX_WORKSPACE_ROOT`.\n- `CODEX_WORKSPACE_ROOT=.` is portable and resolves to the directory where the user launched the Argus CLI\u002FAPI.\n- `\u002Fcoding-agents` lists supported providers and shows the current default.\n- `\u002Fcoding-use \u003Cprovider>` selects the default provider. V1 supports `codex`; `claude_code` and `cursor` are reserved provider slots for later.\n- `\u002Fcoding-workspace \u003Cpath>` sets the default workspace without editing `.env`.\n- `\u002Fcoding-network on|off` sets whether new coding tasks can use network access. It defaults to `on` so package registries such as PyPI\u002Fnpm work during coding tasks.\n- `\u002Fcoding-network \u003Ctask_id> on|off` changes the stored network mode for a specific task. `\u002Fcoding-allow-network \u003Ctask_id>` is a shortcut for turning it on.\n- If an older Codex thread was created before network was enabled, Argus restarts Codex from the saved task context on the next continuation so the new network mode actually reaches the provider shell.\n- If a Telegram chat lists or inspects a running coding task, that chat is subscribed to future progress\u002Fcompletion updates for the task.\n- When `\u002Fconnect telegram` succeeds, Argus also subscribes known Telegram chats from `TELEGRAM_ALLOWED_USERS` or previous Telegram history to currently active coding tasks.\n- Asking the agent to change the coding agent, workspace, or network mode can use the same native coding-agent config tool.\n\nCommon Codex session flow:\n\n```text\n\u002Fcoding build an ml project scaffold\n\u002Fcoding-continue add a README and run the tests\n\u002Fcoding-new investigate a separate bug in the API\n\u002Fcoding-continue \u003Ctask_id> continue an older task explicitly\n```\n\nEach Codex task stores the provider session id when Codex returns one, so\n`\u002Fcoding-continue` can resume the same Codex thread instead of starting from a\nblank session.\n\nCodex MCP returns the final tool result at the end of a provider call rather\nthan streaming every shell stdout line to Argus. For long work, Argus injects a\nprogress-reporting instruction and polls the workspace progress file above; if\nCodex cannot write it, Argus still sends periodic heartbeat\u002Fstatus events.\n\nFor commands that need a task id, the CLI provides task-id selection. Type the\ncommand followed by a space, choose a recent task with `Ctrl+N` \u002F `Ctrl+P`, then\npress `Tab` to insert the selected id:\n\n```text\n\u002Fcoding-status \u003Cspace>\n\u002Fcoding-log \u003Cspace>\n\u002Fcoding-reply \u003Cspace>\n\u002Fcoding-cancel \u003Cspace>\n\u002Fcoding-delete \u003Cspace>\n\u002Fcoding-continue \u003Cspace>\n\u002Fcoding-network \u003Cspace>\n\u002Fcoding-allow-network \u003Cspace>\n```\n\nTask statuses:\n\n```text\nqueued\nrunning\nwaiting_user\ncompleted\nfailed\ncancelled\n```\n\nDefault safety policy:\n\n- Auto-approve low-risk read-only operations, normal edits inside the configured workspace, and routine test\u002Fbuild commands.\n- Coding-task network access defaults to `on` inside the workspace-write sandbox. Turn it off globally with `\u002Fcoding-network off`, or for a task with `\u002Fcoding-network \u003Ctask_id> off`.\n- Ask for user confirmation before destructive commands, writes outside the workspace, secret\u002Fenv edits, git commits, git pushes, credential access, or anything the provider explicitly marks as confirmation-required.\n\nCoding requests use a shallow retrieval profile in normal channels. Argus still\nretrieves recent local context, but it skips the HyDE LLM call and forces\nstandard mode with small `top_k` instead of broad\u002Fdeep memory search. Benchmark\nmode is unchanged.\n\n### Adding A New Tool\n\nTool expansion is deliberately simple. To add a new capability, drop a new folder inside `tools\u002F` and follow the existing skill convention.\n\n```text\ntools\u002F\n  your_tool\u002F\n    skill.md\n    __init__.py\n    tools.py\n```\n\n`skill.md` defines the router-facing metadata:\n\n```markdown\n---\nname: your_tool\ndescription: One-line description of what this skill can do.\ntriggers: keyword one, keyword two, natural language trigger\n---\n\n# Your Tool Skill\n\nDescribe when to use it, when not to use it, available tools, and operating rules.\n```\n\n`tools.py` defines LangChain tool callables, and `__init__.py` exports them using the folder-name convention:\n\n```python\nfrom .tools import your_first_tool, your_second_tool\n\nYOUR_TOOL_TOOLS = [your_first_tool, your_second_tool]\n```\n\nThat is it. `tools\u002F__init__.py` automatically scans every subdirectory with a `skill.md`, imports the package, reads the frontmatter, and registers the exported `\u003CFOLDER_NAME>_TOOLS` list. No central registry edit is required.\n\nFor user-scoped dynamic tools, a skill package can also export `\u003CFOLDER_NAME>_TOOLS_FACTORY(runtime_config)`. This is how the cloud-tools skill creates session tools for the active `user_id` without hard-coding every external app schema into the prompt.\n\nThis gives Argus a clean capability expansion path: add a folder, describe the skill, export the tools, restart the process, and the agent can route to the new capability.\n\n## Benchmarks\n\nArgus includes a LongMemEval runner:\n\n```bash\npython run_dataset_evals.py\n```\n\nThe benchmark pipeline:\n\n1. Loads `eval_datasets\u002Flongmemeval_s_cleaned.json`\n2. Splits haystack sessions into chunks\n3. Feeds each chunk through the agent with learning enabled\n4. Learns benchmark memory-ingestion chunks synchronously before returning the ingestion ACK\n5. Asks the benchmark question with learning disabled\n6. Judges with a binary evaluator\n7. Stores `question_type` with each result row for reporting, while benchmark agent calls use the same runtime interface as normal agent calls\n8. Writes results to `reports\u002Flongmemeval_results.json`\n\nFor each benchmark history chunk, the agent splits the chunk into individual user\u002Fassistant pairs. Semantic and episodic actions are staged in RAM after each pair, so the next pair sees the updated working memory. After all pairs in the chunk are processed, staged semantic and episodic actions are committed to the vector stores. Procedural learning also runs per pair and refreshes procedural context when rules change.\n\n### Parallel Evaluation\n\nFor faster local benchmark runs, Argus also includes a process-based parallel evaluator:\n\n```bash\nEVAL_WORKERS=5 python run_dataset_evals_parallel.py\n```\n\nThe parallel runner is designed for long benchmark runs where you do not want to lose completed progress. It first loads all completed question IDs from:\n\n```text\nreports\u002Flongmemeval_results.json\nreports\u002Flongmemeval_results.worker*.json\n```\n\nThen it calculates the remaining questions, splits only those questions across the requested number of workers, and launches one isolated process per worker.\n\nEach worker receives its own `AGENT_ID`:\n\n```text\n\u003CAGENT_ID>_eval_worker_0\n\u003CAGENT_ID>_eval_worker_1\n\u003CAGENT_ID>_eval_worker_2\n...\n```\n\nThat means each worker gets an isolated FAISS memory folder under `local_memory\u002F`, so multiple questions can be learned and evaluated at the same time without memory collision.\n\nWorker outputs are written independently:\n\n```text\nreports\u002Flongmemeval_results.worker0.json\nreports\u002Flongmemeval_results.worker1.json\nreports\u002Flongmemeval_results.worker2.json\n```\n\nWhen all workers finish, the runner merges worker outputs back into:\n\n```text\nreports\u002Flongmemeval_results.json\n```\n\nThe default is `5` workers. Choose `EVAL_WORKERS` based on the machine and API limits, then increase until OpenAI rate limits or local CPU pressure become the bottleneck.\n\n### Prompt Regression Samples\n\nFor faster iteration on prompt and retrieval changes, use the sample runner:\n\n```bash\npython run_dataset_evals_sample.py\n```\n\nIt reuses the parallel evaluator, selects a deterministic sample, writes a manifest of sampled questions, keeps per-worker checkpoints, and merges results into a sample-specific report. Useful environment variables:\n\n```text\nEVAL_SAMPLE_NAME=prompt_regression\nEVAL_SAMPLE_SIZE=60\nEVAL_SAMPLE_SOURCE_LIMIT=500\nEVAL_SAMPLE_SEED=test6\nEVAL_SAMPLE_QUESTION_IDS=\u003Ccomma-separated ids>\nEVAL_SAMPLE_FRESH=1\nEVAL_WORKERS=20\n```\n\nTo monitor live results across both the main and worker result files:\n\n```bash\npython monitor_results.py\n```\n\nCurrent local report files:\n\n```text\nreports\u002Flongmemeval_results.json\nreports\u002Flongmemeval_results.worker*.json\n```\n\n### Benchmark Cost Planning\n\nLongMemEval-S is an expensive benchmark for this agent because each question\nfeeds many conversation-history chunks, and each chunk can trigger retrieval\nplanning, memory learning, embeddings, and final answer generation.\n\nObserved cost with the current model mix:\n\n| Run size | Approximate cost |\n| --- | ---: |\n| 1 average question | about `$5` |\n| 10 questions | about `$50` |\n| 100 questions | about `$500` |\n| Full 500-question LongMemEval-S run | about `$2,500` |\n\nThe current benchmark model mix is:\n\n| Component | Model | Input \u002F 1M | Output \u002F 1M | Cached input \u002F 1M |\n| --- | --- | ---: | ---: | ---: |\n| Retrieval planning | `gpt-4o-mini` | `$0.15` | `$0.60` | `$0.075` |\n| Generation | `gpt-4.1` | `$2.00` | `$8.00` | `$0.50` |\n| Memory learning | `gpt-4.1` | `$2.00` | `$8.00` | `$0.50` |\n| Embeddings | `text-embedding-3-large` | `$0.13` | n\u002Fa | n\u002Fa |\n| Benchmark judge | `gpt-5` | `$1.25` | `$10.00` | `$0.125` |\n\nFor the current LongMemEval-S dataset, the benchmark runner sees about 41,813\nchunks total, or about 83.6 chunks per question. The direct chunk-ingestion\ntraffic alone is about 62.3M estimated input tokens, but that is only a lower\nbound. The full cost is higher because the agent re-reads chunk content during\nlearning and creates embeddings for retrieval and memory writes.\n\nRun a 1-question or small-sample benchmark first and inspect recorded usage\nmetrics before running all 500 questions.\n\n### Current Local Metrics\n\nCurrent local LongMemEval-S metrics, computed from `reports\u002Flongmemeval_results.json` and joined with `eval_datasets\u002Flongmemeval_s_cleaned.json` by `question_id`:\n\n| Question type | Correct | Incorrect | Total | Accuracy |\n| --- | ---: | ---: | ---: | ---: |\n| Overall | 491 | 9 | 500 | 98.20% |\n| knowledge-update | 77 | 1 | 78 | 98.72% |\n| multi-session | 129 | 4 | 133 | 96.99% |\n| single-session-assistant | 56 | 0 | 56 | 100.00% |\n| single-session-preference | 30 | 0 | 30 | 100.00% |\n| single-session-user | 70 | 0 | 70 | 100.00% |\n| temporal-reasoning | 129 | 4 | 133 | 96.99% |\n\nThese metrics represent the current LongMemEval-S progress while Argus Agent is actively being improved. Some answers may change as failing or uncertain questions are rerun and fixes are added.\n\n## Requirements\n\n- Python 3.11 or higher\n- An [OpenAI API key](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys)\n- Optional for Telegram: a Telegram bot token from `@BotFather`\n- Optional for Telegram without a domain: `cloudflared`. Setup can install it with Homebrew, Windows `winget`, or a direct Windows PowerShell download fallback.\n- Optional for coding delegation: OpenAI Codex CLI on your `PATH`\n\n## Quick Start\n\nClone the repo, then run the setup script from the repo root.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fquarqlabs\u002Fargus.git\ncd argus\n```\n\nmacOS\u002FLinux:\n\n```bash\npython3 scripts\u002Fsetup_argus.py\n```\n\nWindows PowerShell:\n\n```powershell\npy scripts\\setup_argus.py\n```\n\nThe setup script:\n\n- creates `.venv`\n- installs `requirements.txt`\n- creates `.env` from `.env.example` if missing\n- installs `cloudflared` when possible\n- installs the global `argus` launcher for your user\n- updates the user `PATH` for future terminals when possible\n\n`cloudflared` setup behavior:\n\n- macOS\u002FLinux with Homebrew: `brew install cloudflare\u002Fcloudflare\u002Fcloudflared`\n- Windows with `winget`: `winget install --id Cloudflare.cloudflared --source winget --accept-package-agreements --accept-source-agreements`\n- Windows without `winget`: download the latest Cloudflare binary to `C:\\cloudflared\\cloudflared.exe`, temporarily add `C:\\cloudflared` to the setup process `PATH`, and verify with `cloudflared.exe --version`\n\nThe CLI also checks `C:\\cloudflared\\cloudflared.exe` directly on Windows, so `\u002Fconnect telegram` can work even before a new terminal picks up a permanent PATH update.\n\nAfter setup, edit `.env` and fill at least:\n\n```bash\nOPENAI_API_KEY=your_api_key\nUSER_ID=local_user\nAGENT_ID=local_agent\nLOCAL_MEMORY_ROOT=local_memory\n```\n\nOpen a new terminal, or run the PATH refresh command printed by the setup\nscript. Then start Argus from any directory:\n\n```bash\nargus\n```\n\nIf you only want to install or repair the global launcher without reinstalling\ndependencies, run the lighter launcher installer.\n\nmacOS\u002FLinux:\n\n```bash\npython3 scripts\u002Finstall_argus.py --force\n```\n\nWindows PowerShell:\n\n```powershell\npy scripts\\install_argus.py --force\n```\n\nThe global launcher points back to the cloned repo. If you move or delete that\nrepo folder, rerun the launcher installer from the new location.\n\nOn Windows, the launcher sets Python UTF-8 mode so CLI\u002FAPI status symbols do\nnot crash under legacy `cmd.exe` code pages. If you see a `charmap` error such\nas `can't encode character '\\u274c'`, pull the latest code and rerun:\n\n```powershell\npy scripts\\install_argus.py --force\n```\n\nThe control console starts `main:app` for you, connects the CLI to the API job queue, and shows structured events as requests move through triage, retrieval, tool routing, generation, tool use, and final response.\n\nFor coding-agent delegation, the default Codex provider launches:\n\n```bash\ncodex mcp-server\n```\n\nThe Python side uses the OpenAI Agents SDK dependency from `requirements.txt`.\nIf the Codex CLI or the Agents SDK is missing, Argus records a failed coding\ntask with a setup message instead of crashing. On macOS, if `codex` is not on\nthe API worker's `PATH`, Argus also checks the standard Codex.app binary at\n`\u002FApplications\u002FCodex.app\u002FContents\u002FResources\u002Fcodex`.\n\nThe API worker keeps process-lifetime chat history per channel and passes the\nlast four user\u002Fassistant pairs into each agent request. This preserves short\nreferences such as \"done\" after an auth link or \"now check calendar\" without\nstuffing the full conversation into every prompt.\n\nYou can still run the raw terminal agent directly:\n\n```bash\npython agent.py\n```\n\nOr run only the API server:\n\n```bash\nuvicorn main:app --reload\n```\n\nCall the API:\n\n```bash\ncurl -X POST http:\u002F\u002F127.0.0.1:8000\u002Fapi\u002Fchat \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"prompt\": \"What do you remember about me?\", \"channel_type\": \"web\"}'\n```\n\n## Control Console\n\n`agent_cli.py` is the recommended local entrypoint. It provides a Codex-style terminal surface around the local FastAPI worker:\n\n- starts `main:app` on `127.0.0.1:8000`\n- hides noisy HTTP client logs\n- shows a scrollable transcript of structured events\n- shows the current model label, working directory, API URL, connected channels, startup channels, default coding agent, coding workspace, and coding network mode\n- reads the agent name from the live local identity config, so a rename through `agent_identity_manager` updates the header without restarting\n- supports formatted Markdown in agent responses\n- supports multiline input: `Enter` sends, `Shift+Enter` inserts a newline\n- supports command suggestions when you type `\u002F`, with `Tab` completing the first suggestion\n\nConsole commands:\n\n```text\n\u002Fhelp\n\u002Fstatus\n\u002Ftools\n\u002Fwhich-tool \u003Ctask>\n\u002Fcloud-tools\n\u002Fadd-tool \u003Ctool>\n\u002Fremove-tool \u003Ctool>\n\u002Fcoding \u003Ctask>\n\u002Fcoding-new \u003Ctask>\n\u002Fcoding-continue \u003Cmessage>\n\u002Fcoding-continue \u003Ctask_id> \u003Cmessage>\n\u002Fcoding-tasks\n\u002Fcoding-agents\n\u002Fcoding-use \u003Cprovider>\n\u002Fcoding-workspace \u003Cpath>\n\u002Fcoding-network [on|off]\n\u002Fcoding-network \u003Ctask_id> on|off\n\u002Fcoding-allow-network \u003Ctask_id>\n\u002Fcoding-status \u003Ctask_id>\n\u002Fcoding-log \u003Ctask_id>\n\u002Fcoding-reply \u003Ctask_id> \u003Cmessage>\n\u002Fcoding-cancel \u003Ctask_id>\n\u002Fcoding-delete \u003Ctask_id>\n\u002Fcoding-clear\n\u002Fconnect telegram\nset-default start-channel telegram\nset-default start-channel none\n\u002Fwipe\n\u002Fquit\n```\n\n`\u002Fconnect telegram` starts the Telegram connection pipeline only when you ask for it. `set-default start-channel telegram` stores a local startup preference in `local_memory\u002F\u003CAGENT_ID>\u002Fagent_cli.json`, so future CLI launches can connect that channel automatically. `set-default start-channel none` clears that preference.\n\n## Agent Identity Config\n\nArgus no longer needs Supabase for local identity updates. Runtime identity uses a local JSON config file, with `.env` values as defaults.\n\nDefault location:\n\n```text\nlocal_memory\u002F\u003CAGENT_ID>\u002Fagent_identity.json\n```\n\nOverride location:\n\n```bash\nAGENT_IDENTITY_CONFIG_PATH=local_memory\u002Flocal_agent\u002Fagent_identity.json\n```\n\nSupported identity fields:\n\n```json\n{\n  \"agent_name\": \"Argus\",\n  \"agent_personality\": \"professional and helpful\",\n  \"agent_use_cases\": [\"general assistance\"],\n  \"agent_custom_prompt\": \"\"\n}\n```\n\nInitial values can come from `.env`:\n\n```bash\nAGENT_NAME=Argus\nAGENT_PERSONALITY=\"friendly, precise, high-energy\"\nAGENT_USE_CASES=[\"coding\",\"research\",\"life-long memory\"]\nAGENT_CUSTOM_PROMPT=\"Be concise, grounded, and useful.\"\n```\n\nWhen the user asks the agent to rename itself, change its personality, update its main use cases, or change global instructions, the `agent_identity_manager` tool writes the update to the JSON config file. The env values remain fallback defaults for a new agent profile or a missing config file.\n\n## Channel Integrations\n\nChannels are API-facing integrations. The first supported channel is Telegram; the design leaves room for WhatsApp and other channels later.\n\nTelegram supports text, captions, photos, documents, PDFs, DOCX files, audio,\nvoice notes, videos, stickers, and other Telegram file objects. Every received\nfile is downloaded into local channel storage, indexed with source metadata, and\npassed to the current agent job as attachment context. Future channel adapters\ncan use the same generic `POST \u002Fapi\u002Ffiles` endpoint, then include the returned\nattachment IDs in `\u002Fapi\u002Fjobs` or `\u002Fapi\u002Fchat`.\n\nTelegram may allow a user to send a larger file into the chat, but the official\nbot download path is limited. Argus defaults `CHANNEL_FILE_MAX_BYTES` to\n`20000000` bytes (about 20 MB) for channel attachments. If a file is too large\nor cannot be downloaded\u002Fread, the Telegram bot replies with a clear attachment\nfailure message instead of silently answering from the caption alone.\n\n### Telegram\n\n1. Create a Telegram bot with `@BotFather`.\n2. Put the bot token in `.env`.\n3. Put your Telegram numeric user ID in `TELEGRAM_ALLOWED_USERS`.\n4. Set a random `TELEGRAM_WEBHOOK_SECRET`.\n5. Install `cloudflared` if you do not have a public domain. `scripts\u002Fsetup_argus.py` can install it automatically, including the direct Windows fallback to `C:\\cloudflared\\cloudflared.exe`.\n6. Run `python agent_cli.py`.\n7. In the console, run `\u002Fconnect telegram`.\n\nExample `.env` values:\n\n```bash\nTELEGRAM_BOT_TOKEN=123456789:replace_with_botfather_token\nTELEGRAM_ALLOWED_USERS=123456789\nTELEGRAM_WEBHOOK_SECRET=replace_with_random_secret\n```\n\nWhat `\u002Fconnect telegram` does:\n\n1. Starts a temporary Cloudflare tunnel to the local API.\n2. Builds the public webhook URL as `\u003Ctunnel-url>\u002Fapi\u002Ftelegram\u002Fwebhook`.\n3. Calls Telegram `setWebhook` with the webhook secret.\n4. Shows channel registration progress in the CLI.\n\nTelegram messages are processed through the same API job queue as CLI messages. While a response is generating, the API sends Telegram `typing` chat actions so the chat feels alive instead of silent.\n\nChannel commands also work from Telegram:\n\n```text\n\u002Fhelp\n\u002Fstatus\n\u002Ftools\n\u002Fwhich-tool \u003Ctask>\n\u002Fcloud-tools\n\u002Fadd-tool \u003Ctool>\n\u002Fremove-tool \u003Ctool>\n\u002Fcoding \u003Ctask>\n\u002Fcoding-new \u003Ctask>\n\u002Fcoding-continue \u003Cmessage>\n\u002Fcoding-continue \u003Ctask_id> \u003Cmessage>\n\u002Fcoding-tasks\n\u002Fcoding-agents\n\u002Fcoding-use \u003Cprovider>\n\u002Fcoding-workspace \u003Cpath>\n\u002Fcoding-network [on|off]\n\u002Fcoding-network \u003Ctask_id> on|off\n\u002Fcoding-allow-network \u003Ctask_id>\n\u002Fcoding-status \u003Ctask_id>\n\u002Fcoding-log \u003Ctask_id>\n\u002Fcoding-reply \u003Ctask_id> \u003Cmessage>\n\u002Fcoding-cancel \u003Ctask_id>\n\u002Fcoding-delete \u003Ctask_id>\n\u002Fcoding-clear\n\u002Fwipe\n\u002Fquit\n```\n\n`\u002Fquit` only stops the local CLI when typed in the console. From Telegram it returns a safety message because remote channels should not stop the local process.\n\n## API Job Queue\n\nThe FastAPI worker exposes both synchronous and job-based paths.\n\nSynchronous compatibility route:\n\n```text\nPOST \u002Fapi\u002Fchat\n```\n\nJob queue routes:\n\n```text\nPOST \u002Fapi\u002Fjobs\nGET  \u002Fapi\u002Fjobs\u002F{job_id}\nGET  \u002Fapi\u002Fevents?after=\u003Cevent_id>\n```\n\nCoding-task routes:\n\n```text\nGET  \u002Fapi\u002Fcoding-agents\nPOST \u002Fapi\u002Fcoding-agents\u002Fdefault\nPOST \u002Fapi\u002Fcoding-agents\u002Fworkspace\nPOST \u002Fapi\u002Fcoding-agents\u002Fnetwork\nGET  \u002Fapi\u002Fcoding-tasks\nDELETE \u002Fapi\u002Fcoding-tasks\nPOST \u002Fapi\u002Fcoding-tasks\u002Fsubscribe-channel\nPOST \u002Fapi\u002Fcoding-tasks\nPOST \u002Fapi\u002Fcoding-tasks\u002Flatest\u002Freply\nGET  \u002Fapi\u002Fcoding-tasks\u002F{task_id}\nDELETE \u002Fapi\u002Fcoding-tasks\u002F{task_id}\nGET  \u002Fapi\u002Fcoding-tasks\u002F{task_id}\u002Flogs\nPOST \u002Fapi\u002Fcoding-tasks\u002F{task_id}\u002Freply\nPOST \u002Fapi\u002Fcoding-tasks\u002F{task_id}\u002Fcancel\nPOST \u002Fapi\u002Fcoding-tasks\u002F{task_id}\u002Fnetwork\n```\n\nThe CLI uses the job routes. A request is enqueued, the single worker processes jobs one by one, and status events are emitted for:\n\n- triage\n- retrieval\n- tool routing\n- generation\n- tool running\u002Fcompleted\u002Ffailed\n- coding-agent progress\n- final response\n\nThis is what lets the console show useful loader text such as memory retrieval, response generation, and active tool usage instead of blocking silently until the final answer arrives.\n\n## Environment Variables\n\n| Variable | Required | Description |\n|---|---:|---|\n| `OPENAI_API_KEY` | yes | Used for generation, retrieval planning, learning, and embeddings. |\n| `AGENT_ID` | no | Selects the local memory namespace. Defaults to `local_agent`. |\n| `USER_ID` | API only | Required by `main.py` for the FastAPI worker. |\n| `LOCAL_MEMORY_ROOT` | no | Root folder for local memory. Defaults to `local_memory`. |\n| `LOCAL_CHANNEL_STORAGE_ROOT` | no | Optional override for durable channel chat history and attachment storage. Defaults to `local_memory\u002F\u003CAGENT_ID>\u002Fchannel_state`. |\n| `CHANNEL_FILE_MAX_BYTES` | no | Max accepted channel attachment size in bytes. Defaults to `20000000` (about 20 MB, matching the practical Telegram bot download ceiling). |\n| `ATTACHMENT_EXTRACT_MAX_CHARS` | no | Max extracted text saved from an attachment. Defaults to `24000`. |\n| `MULTIMODAL_IMAGE_MODEL` | no | Optional OpenAI model for image descriptions. Defaults to `gpt-4o-mini`. |\n| `MULTIMODAL_AUDIO_MODEL` | no | Optional OpenAI model for audio transcription. Defaults to `gpt-4o-mini-transcribe`. |\n| `PDF_VISION_MAX_PAGES` | no | Max PDF pages rendered for vision OCR fallback when embedded text extraction fails. Defaults to `3`. |\n| `AGENT_IDENTITY_CONFIG_PATH` | no | Optional override for the local identity config file. Defaults to `local_memory\u002F\u003CAGENT_ID>\u002Fagent_identity.json`. |\n| `AGENT_NAME` | no | Default persona name when no local identity config exists. |\n| `AGENT_PERSONALITY` | no | Default tone\u002Fpersonality when no local identity config exists. |\n| `AGENT_USE_CASES` | no | Default use-case description. Accepts a JSON array or comma-separated string. |\n| `AGENT_CUSTOM_PROMPT` | no | Default custom behavior instructions when no local identity config exists. |\n| `ARGUS_AGENT_VERSION` | no | Display-only version label for the control console. Defaults to `v0.5.0`. |\n| `ARGUS_MODEL_LABEL` | no | Display-only model label for the control console. Falls back to generation model labels. |\n| `ARGUS_REASONING_EFFORT` | no | Optional display suffix for the console model label. |\n| `AGENT_DEBUG` | no | Set to `true`\u002F`1` to show verbose debug logs from `agent.py`; metrics still print without debug. |\n| `CLOUD_TOOLS_API_KEY` | cloud tools only | Required to use external app tools through the cloud-tool session. |\n| `CLOUD_TOOLKITS` | no | Comma-separated cloud-tool slugs. Defaults to `github,gmail,googlecalendar,slack,notion,linear`. |\n| `CLOUD_TOOLS_CONFIG_PATH` | no | Optional override for enabled cloud-tool config. Defaults to `local_memory\u002F\u003CAGENT_ID>\u002Fagent_tools.json`. |\n| `CLOUD_TOOLS_CACHE_DIR` | no | Writable cloud-tool SDK cache directory. Defaults to `local_memory\u002Fcloud_tools_cache`. |\n| `CODING_AGENTS_ENABLED` | no | Enables coding-agent delegation. Defaults to `true`. |\n| `CODING_AGENT_DEFAULT_PROVIDER` | no | Default coding provider. V1 supports `codex`. |\n| `CODEX_MCP_COMMAND` | no | Command used to launch Codex MCP. Defaults to `codex`. |\n| `CODEX_MCP_ARGS` | no | Comma-separated args for Codex MCP. Defaults to `mcp-server`. |\n| `CODEX_WORKSPACE_ROOT` | no | Workspace path for delegated coding work. `.env.example` uses `.`, resolved from the user's launch directory. |\n| `CODEX_APPROVAL_POLICY` | no | Approval policy label. Defaults to `argus-safe-auto`. |\n| `CODEX_NETWORK_ACCESS` | no | Allows network access for new Codex coding tasks inside the workspace-write sandbox. Defaults to `true`; use `\u002Fcoding-network off` to disable locally. |\n| `CODEX_TASK_TIMEOUT_SECONDS` | no | Max runtime for a delegated coding task. Defaults to `1800`. |\n| `TELEGRAM_BOT_TOKEN` | Telegram only | Bot token from `@BotFather`. Required for `\u002Fconnect telegram`. |\n| `TELEGRAM_ALLOWED_USERS` | Telegram recommended | Comma-separated numeric Telegram user IDs allowed to use the local agent. |\n| `TELEGRAM_WEBHOOK_SECRET` | Telegram recommended | Secret token sent to Telegram `setWebhook` and verified by `\u002Fapi\u002Ftelegram\u002Fwebhook`. |\n\nAgent identity updates are local-first. The `agent_identity_manager` tool writes\nto the JSON config file above, while env values remain startup defaults\u002Ffallbacks.\n\n## Repository Map\n\n```text\nagent.py                  Core LangGraph agent, memory, retrieval, generation, learning\nagent_cli.py              Local control console for API, jobs, events, and channels\nagent_config.py           Local agent identity config loader\u002Fsaver\nagent_connector.py        Public async integration gateway\nmain.py                   FastAPI single-tenant worker\nlocal_channel_store.py    Durable channel history and attachment storage\ncoding_agents\u002F            Provider-neutral coding task store, policy, manager, Codex runner\nrun_dataset_evals.py      LongMemEval evaluation runner\nrun_dataset_evals_parallel.py\n                          Parallel LongMemEval evaluation runner\nmonitor_results.py        Benchmark monitoring helper\ntools\u002F                    Skill registry and tool implementations\ntools\u002Fcoding_agent\u002F       Native skill for coding-agent delegation\neval_datasets\u002F            Cleaned LongMemEval dataset\nreports\u002F                  Evaluation outputs and checkpoints\nlocal_memory\u002F             Local FAISS and JSON memory stores\n```\n\n## Design Principles\n\nArgus is built around a few hard rules:\n\n- Retrieve broadly, reason narrowly.\n- Store memories with ownership, dates, and qualifiers intact.\n- Prefer saying \"missing data\" over inventing an answer.\n- Treat temporal and numeric claims as evidence-bound operations.\n- Keep normal user-facing latency low by learning in the background, while keeping benchmark ingestion deterministic by learning synchronously.\n- Keep the context window clean with routing and progressive disclosure.\n- Make the memory system portable, local, and easy to inspect.\n\n## Status\n\nArgus Agent v0.5.0 is an active OSS release candidate.\n\nThe current version is optimized for long-memory evaluation and single-user local memory. The next natural steps are:\n\n- package cleanup\n- dependency trimming\n- unit tests for memory storage and retrieval\n- reproducible benchmark scripts\n- Docker packaging\n- memory compaction and archival policies\n- multi-user serving with isolated local stores\n\n## License\n\nApache License 2.0. See [LICENSE](LICENSE) for details.\n",2,"2026-06-11 04:12:16","CREATED_QUERY"]