[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1461":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":14,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":9,"rankLanguage":9,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":17,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},1461,"hypatia","MarchLiu\u002Fhypatia","MarchLiu","\"We can wander through the stacks of the Library of Alexandria, imagining the scrolls and the knowledge they contain. Its destruction is a warning: all we have is transient.”--Alberto Manguel",null,"Rust",212,13,1,0,43.44,"MIT License",false,"main",true,[],"2026-06-12 04:00:09","# Hypatia\n\n\"We can wander through the stacks of the Library of Alexandria, imagining the scrolls and the knowledge they contain. Its destruction is a warning: all we have is transient.\"——Alberto Manguel\n\nAI-oriented memory management system. Stores structured knowledge as a graph of **Knowledge** entries (nodes) and **Statement** triples (edges), queried via a custom JSON Search Expression (JSE) language. Built on SQLite FTS5 + DuckDB, with configurable embedding models (local ONNX or remote API) for semantic vector search.\n\n## Features\n\n- **Knowledge Graph** -- Knowledge entries (named info points with tags) and Statement triples (subject-predicate-object with temporal ranges)\n- **JSE Query Engine** -- JSON-based query language compiling to parameterized SQL + FTS5, supporting `$and`, `$or`, `$not`, `$eq`, `$ne`, `$gt`, `$lt`, `$contains`, `$like`, `$content`, `$search`, `$quote`, `$triple`, `$k-hop`\n- **Dual-Database Storage** -- DuckDB for structured queries + vector search, SQLite FTS5 (Porter stemmer + multi-column BM25) for full-text search, auto-synchronized\n- **Configurable Vector Search** -- Local ONNX models (BGE-M3 default) or remote API (OpenAI-compatible) for semantic similarity search via DuckDB cosine distance\n- **Synonyms** -- Per-entry synonym lists for knowledge, per-position (subject\u002Fpredicate\u002Fobject) synonyms for statements, indexed in FTS\n- **Shelf System** -- Named, connectable, exportable data directories for isolation\n- **CLI + REPL** -- Full command-line interface with interactive mode (rustyline)\n- **Agent Integration** -- Claude Code skill for natural-language-to-CLI translation\n- **Cross-Platform** -- Build for 18+ targets (Linux, macOS, Windows, FreeBSD, NetBSD, illumos, Android)\n\n## Quick Start\n\n```bash\n# Build\ncargo build --release\n\n# Download embedding model (BGE-M3, recommended)\nmkdir -p ~\u002F.hypatia\u002Fdefault\nhf download BAAI\u002Fbge-m3 --local-dir \u002Ftmp\u002Fbge-m3\ncp \u002Ftmp\u002Fbge-m3\u002Fonnx\u002Fmodel.onnx ~\u002F.hypatia\u002Fdefault\u002Fembedding_model.onnx\ncp \u002Ftmp\u002Fbge-m3\u002Fonnx\u002Fmodel.onnx_data ~\u002F.hypatia\u002Fdefault\u002Fmodel.onnx_data\ncp \u002Ftmp\u002Fbge-m3\u002Fonnx\u002Ftokenizer.json ~\u002F.hypatia\u002Fdefault\u002Ftokenizer.json\n\n# Create knowledge\nhypatia knowledge-create \"Rust\" -d \"systems programming language\" -t \"language,compiled\"\n\n# Create a relationship\nhypatia statement-create \"Rust\" \"is_a\" \"systems language\"\n\n# Full-text search\nhypatia search \"programming language\"\n\n# Vector similarity search (requires embedding model)\nhypatia backfill    # generate embeddings for existing entries\nhypatia similar \"programming language\"  # semantic search\n\n# Structured query (JSE)\nhypatia query '[\"$knowledge\", [\"$eq\", \"name\", \"Rust\"]]'\nhypatia query '[\"$statement\", [\"$triple\", \"Rust\", \"$*\", \"$*\"]]'\nhypatia query '[\"$knowledge\", [\"$search\", \"database migration\"]]'\n\n# Interactive REPL\nhypatia repl\n```\n\n## Embedding Models\n\nHypatia supports multiple embedding backends, configured via `shelf.toml` in the shelf directory (e.g., `~\u002F.hypatia\u002Fdefault\u002Fshelf.toml`).\n\n### Default: BAAI\u002Fbge-m3 (Local ONNX)\n\nNo configuration needed — place model files in the shelf directory and Hypatia auto-detects them.\n\n```bash\n# Download from HuggingFace\nhf download BAAI\u002Fbge-m3 --local-dir \u002Ftmp\u002Fbge-m3\ncp \u002Ftmp\u002Fbge-m3\u002Fonnx\u002Fmodel.onnx ~\u002F.hypatia\u002Fdefault\u002Fembedding_model.onnx\ncp \u002Ftmp\u002Fbge-m3\u002Fonnx\u002Fmodel.onnx_data ~\u002F.hypatia\u002Fdefault\u002Fmodel.onnx_data\ncp \u002Ftmp\u002Fbge-m3\u002Fonnx\u002Ftokenizer.json ~\u002F.hypatia\u002Fdefault\u002Ftokenizer.json\n```\n\n- Dimensions: 1024\n- Max sequence length: 8192\n- Multilingual (100+ languages including Chinese and English)\n\n### Switching to Other Local Models\n\nCreate or edit `~\u002F.hypatia\u002Fdefault\u002Fshelf.toml`:\n\n```toml\n[embedding]\nprovider = \"local\"\ndimensions = 1536\nmax_seq_length = 8192\n```\n\n#### Alibaba-NLP\u002Fgte-Qwen2-1.5B-instruct\n\n```bash\nhf download Alibaba-NLP\u002Fgte-Qwen2-1.5B-instruct --local-dir \u002Ftmp\u002Fgte-qwen2\n# Export ONNX first (requires optimum-cli or transformers)\n# Then place files in shelf directory\n```\n\n- Dimensions: 1536\n- Max sequence length: 8192\n- Strong multilingual performance\n\n#### google\u002Fembedding-gemma-300M (EmbeddingGemma)\n\n```bash\nhf download google\u002Fembedding-gemma-300M --local-dir \u002Ftmp\u002Fembedding-gemma\n# Export ONNX and place files\n```\n\n- Dimensions: 768\n- Max sequence length: 8192\n- Lightweight, fast inference\n\n#### jinaai\u002Fjina-embeddings-v5-text-small\n\n```bash\nhf download jinaai\u002Fjina-embeddings-v5-text-small-text-matching \\\n  onnx\u002Fmodel.onnx onnx\u002Fmodel.onnx_data tokenizer.json \\\n  --local-dir \u002Ftmp\u002Fjina-v5-small\ncp \u002Ftmp\u002Fjina-v5-small\u002Fonnx\u002Fmodel.onnx ~\u002F.hypatia\u002Fdefault\u002Fembedding_model.onnx\ncp \u002Ftmp\u002Fjina-v5-small\u002Fonnx\u002Fmodel.onnx_data ~\u002F.hypatia\u002Fdefault\u002Fmodel.onnx_data\ncp \u002Ftmp\u002Fjina-v5-small\u002Ftokenizer.json ~\u002F.hypatia\u002Fdefault\u002Ftokenizer.json\n```\n\nshelf.toml:\n```toml\n[embedding]\nprovider = \"local\"\ndimensions = 1024\nmax_seq_length = 32768\npooling = \"last_token\"\n```\n\n- Dimensions: 1024\n- Max sequence length: 32768\n- 677M params, last-token pooling\n\n#### jinaai\u002Fjina-embeddings-v5-text-nano\n\n```bash\nhf download jinaai\u002Fjina-embeddings-v5-text-nano-text-matching \\\n  onnx\u002Fmodel.onnx onnx\u002Fmodel.onnx_data tokenizer.json \\\n  --local-dir \u002Ftmp\u002Fjina-v5-nano\ncp \u002Ftmp\u002Fjina-v5-nano\u002Fonnx\u002Fmodel.onnx ~\u002F.hypatia\u002Fdefault\u002Fembedding_model.onnx\ncp \u002Ftmp\u002Fjina-v5-nano\u002Fonnx\u002Fmodel.onnx_data ~\u002F.hypatia\u002Fdefault\u002Fmodel.onnx_data\ncp \u002Ftmp\u002Fjina-v5-nano\u002Ftokenizer.json ~\u002F.hypatia\u002Fdefault\u002Ftokenizer.json\n```\n\nshelf.toml:\n```toml\n[embedding]\nprovider = \"local\"\ndimensions = 768\nmax_seq_length = 8192\npooling = \"last_token\"\n```\n\n- Dimensions: 768\n- Max sequence length: 8192\n- 239M params, last-token pooling, fastest inference\n\n### Remote API (OpenAI-Compatible)\n\nFor cloud-based embeddings without local model files:\n\n```toml\n[embedding]\nprovider = \"remote\"\napi_url = \"https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fembeddings\"\napi_key_env = \"OPENAI_API_KEY\"\napi_model = \"text-embedding-3-small\"\ndimensions = 1536\n```\n\nWorks with any OpenAI-compatible API:\n- OpenAI (`text-embedding-3-small`, `text-embedding-3-large`)\n- Azure OpenAI\n- Local servers (Ollama, LM Studio, vLLM)\n- Other providers (Voyage AI, Cohere, Jina, Gitee AI)\n\n#### Ollama Example\n\n```toml\n[embedding]\nprovider = \"remote\"\napi_url = \"http:\u002F\u002Flocalhost:11434\u002Fv1\u002Fembeddings\"\napi_key_env = \"OLLAMA_API_KEY\"\napi_model = \"qwen3-embedding:8b\"\ndimensions = 1024\n```\n\n```bash\n# Set a dummy key (Ollama doesn't require auth)\nexport OLLAMA_API_KEY=\"ollama\"\n```\n\nSet the API key as an environment variable:\n```bash\nexport OPENAI_API_KEY=\"sk-...\"\n```\n\n### Configuration Reference\n\n| Field | Default | Description |\n|-------|---------|-------------|\n| `provider` | `\"local\"` | `\"local\"` for ONNX, `\"remote\"` for HTTP API |\n| `dimensions` | `1024` | Embedding vector dimensions |\n| `max_seq_length` | `8192` | Max tokenizer sequence length (local only) |\n| `pooling` | `\"mean\"` | Pooling strategy: `\"mean\"`, `\"cls\"`, or `\"last_token\"` (local only) |\n| `model_path` | `embedding_model.onnx` | ONNX model path, relative to shelf dir (local only) |\n| `tokenizer_path` | `tokenizer.json` | Tokenizer path, relative to shelf dir (local only) |\n| `api_url` | OpenAI URL | API endpoint URL (remote only) |\n| `api_key_env` | `OPENAI_API_KEY` | Environment variable name for API key (remote only) |\n| `api_model` | `text-embedding-3-small` | Model name sent to API (remote only) |\n\n## CLI Reference\n\n| Command | Description |\n|---------|-------------|\n| `hypatia connect \u003Cpath> [-n \u003Cname>]` | Connect to a shelf directory |\n| `hypatia disconnect \u003Cname>` | Disconnect from a shelf |\n| `hypatia list` | List connected shelves |\n| `hypatia knowledge-create \u003Cname> [-d \u003Cdata>] [-t \u003Ctags>] [--synonyms \u003Ccsv>] [--figures \u003Crefs>]` | Create a knowledge entry |\n| `hypatia knowledge-get \u003Cname>` | Get a knowledge entry |\n| `hypatia knowledge-delete \u003Cname>` | Delete a knowledge entry |\n| `hypatia statement-create \u003Csubj> \u003Cpred> \u003Cobj> [-d \u003Cdata>] [--synonyms \u003Cjson>]` | Create a triple |\n| `hypatia statement-delete \u003Csubj> \u003Cpred> \u003Cobj>` | Delete a triple |\n| `hypatia search \u003Cquery> [-c \u003Ccatalog>] [--limit N]` | Full-text search |\n| `hypatia similar \u003Cquery> [--limit N]` | Vector similarity search |\n| `hypatia backfill` | Generate embeddings for all entries |\n| `hypatia archive-store \u003Cfile> [-n \u003Cname>] [-s \u003Cshelf>]` | Store a file in archives with auto-metadata |\n| `hypatia archive-get \u003Cname> [-o \u003Coutput>] [-s \u003Cshelf>]` | Get an archive file path or copy it |\n| `hypatia archive-list [-s \u003Cshelf>]` | List all archive files |\n| `hypatia query '\u003Cjse-json>'` | Execute a JSE query |\n| `hypatia export \u003Cname> \u003Cdest>` | Export a shelf |\n| `hypatia repl` | Interactive REPL |\n\n## JSE Query Language\n\nJSE (JSON Search Expression) enables precise queries against knowledge or statement tables.\n\n### Syntax\n\n```json\n[\"$knowledge\", condition1, condition2, ...]\n[\"$statement\", condition1, condition2, ...]\n```\n\n### Operators\n\n| Operator | Purpose | Example |\n|----------|---------|---------|\n| `$eq` | Equals | `[\"$eq\", \"name\", \"Rust\"]` |\n| `$ne` | Not equals | `[\"$ne\", \"name\", \"Rust\"]` |\n| `$gt` \u002F `$lt` \u002F `$gte` \u002F `$lte` | Comparison | `[\"$gt\", \"created_at\", \"2025-01-01\"]` |\n| `$contains` | Substring in JSON field | `[\"$contains\", \"tags\", \"backend\"]` |\n| `$like` | SQL LIKE pattern match | `[\"$like\", \"name\", \"Rust%\"]` |\n| `$content` | Match content JSON key-values | `[\"$content\", {\"format\": \"markdown\"}]` |\n| `$search` | Full-text search | `[\"$search\", \"database migration\"]` |\n| `$and` | Logical AND | `[\"$and\", cond1, cond2]` |\n| `$or` | Logical OR | `[\"$or\", cond1, cond2]` |\n| `$not` | Logical NOT | `[\"$not\", cond]` |\n| `$quote` | Prevent evaluation | `[\"$quote\", [\"$eq\", \"x\", \"y\"]]` |\n| `$triple` | Triple position match | `[\"$triple\", \"Alice\", \"$*\", \"Bob\"]` |\n| `$k-hop` | K-hop graph traversal | `[\"$k-hop\", \"Alice\", \"$*\", 2]` |\n\n### Examples\n\n```bash\n# All knowledge entries\nhypatia query '[\"$knowledge\"]'\n\n# Knowledge named \"Rust\" with tag \"systems\"\nhypatia query '[\"$knowledge\", [\"$and\", [\"$eq\", \"name\", \"Rust\"], [\"$contains\", \"tags\", \"systems\"]]]'\n\n# Statements containing \"Alice\" in triple\nhypatia query '[\"$statement\", [\"$contains\", \"triple\", \"Alice\"]]'\n\n# Triple matching: all relationships where Alice is the subject\nhypatia query '[\"$statement\", [\"$triple\", \"Alice\", \"$*\", \"$*\"]]'\n\n# Triple matching: all \"manages\" relationships\nhypatia query '[\"$statement\", [\"$triple\", \"$*\", \"manages\", \"$*\"]]'\n\n# Triple matching: exact triple (uses PK index)\nhypatia query '[\"$statement\", [\"$triple\", \"Alice\", \"knows\", \"Bob\"]]'\n\n# Pattern matching: names starting with \"Al\"\nhypatia query '[\"$knowledge\", [\"$like\", \"name\", \"Al%\"]]'\n\n# Content filtering: all markdown entries\nhypatia query '[\"$knowledge\", [\"$content\", {\"format\": \"markdown\"}]]'\n\n# FTS search within knowledge\nhypatia query '[\"$knowledge\", [\"$search\", \"query optimization\"]]'\n\n# Statements where triple contains Alice or Bob\nhypatia query '[\"$statement\", [\"$or\", [\"$contains\", \"triple\", \"Alice\"], [\"$contains\", \"triple\", \"Bob\"]]]'\n\n# K-hop: all entities reachable from Alice in 2 hops\nhypatia query '[\"$statement\", [\"$k-hop\", \"Alice\", \"$*\", 2]]'\n\n# K-hop: follow \"knows\" edges from Alice, 3 hops deep\nhypatia query '[\"$statement\", [\"$k-hop\", \"Alice\", \"knows\", 3]]'\n```\n\n## Architecture\n\n```\nsrc\u002F\n├── cli\u002F            # CLI commands + REPL (clap + rustyline)\n├── embedding\u002F      # Embedding providers (ONNX local + remote API)\n├── engine\u002F         # JSE parser, AST, evaluator, SQL builder\n├── model\u002F          # Knowledge, Statement, Content, Query types\n├── service\u002F        # Business logic (dual-write to DuckDB + SQLite)\n├── storage\u002F        # DuckDB store, SQLite FTS5 store, shelf manager\n├── lab.rs          # Top-level API facade\n├── error.rs        # Error types\n├── lib.rs          # Module declarations\n└── main.rs         # Entry point\n```\n\nEach **shelf** is a directory containing `data.duckdb` (structured data), `index.sqlite` (FTS5 index), optional `shelf.toml` (embedding config), embedding model files, and an `archives\u002F` directory for attachments. The service layer keeps both databases in sync via dual-write.\n\n## Archive Files\n\nHypatia supports storing archive files (images, PDFs, data, etc.) alongside knowledge entries. Files are stored in the shelf's `archives\u002F` directory and referenced via the `archive:\u002F\u002F` convention in knowledge content.\n\n```\n~\u002F.hypatia\u002Fdefault\u002F\n├── data.duckdb\n├── index.sqlite\n├── shelf.toml\n└── archives\u002F\n    └── euclid\u002F\n        ├── fig_1_1.png\n        └── fig_1_2.svg\n```\n\n### Commands\n\n```bash\n# Store a file (auto-creates knowledge with metadata)\nhypatia archive-store figure.png -n euclid\u002Ffig_1_1.png\n\n# Get the path to a stored archive file\nhypatia archive-get euclid\u002Ffig_1_1.png\n\n# Copy an archive file to a local path\nhypatia archive-get euclid\u002Ffig_1_1.png -o \u002Ftmp\u002Ffig.png\n\n# List all archive files\nhypatia archive-list\n\n# Reference an archive when creating knowledge\nhypatia knowledge-create \"Euclid Prop 1\" \\\n  -d \"Construction of equilateral triangle\" \\\n  --figures \"archive:\u002F\u002Feuclid\u002Ffig_1_1.png\"\n```\n\n### Auto-Created Metadata\n\n`archive-store` automatically creates:\n\n1. **Knowledge entry** with name = archive path, containing filename, size, and MIME type\n2. **Statement**: `\u003Cpath> is_a archive` (graph connectivity)\n\nThis enables JSE queries on archive metadata:\n\n```bash\nhypatia query '[\"$knowledge\", [\"$contains\", \"tags\", \"archive\"]]'\nhypatia query '[\"$knowledge\", [\"$content\", {\"mime_type\": \"image\u002Fpng\"}]]'\nhypatia query '[\"$statement\", [\"$triple\", \"$*\", \"is_a\", \"archive\"]]'\n```\n\n### Convention\n\n- **Storage**: Files go in `\u003Cshelf>\u002Farchives\u002F\u003Cpath>` on the filesystem\n- **Reference**: Use `archive:\u002F\u002F\u003Cpath>` in knowledge content's `figures` field\n- **Resolution**: `archive:\u002F\u002Feuclid\u002Ffig.png` resolves to `\u003Cshelf>\u002Farchives\u002Feuclid\u002Ffig.png`\n- **Export**: `hypatia export` copies the `archives\u002F` directory along with databases\n- **No cascade delete**: Deleting a knowledge entry does not remove the archive file\n\n## Benchmark\n\n### LoCoMo Academic Benchmark\n\nTested on the LoCoMo long-term conversational memory benchmark (ACL 2024, 10 conversations, 1,540 non-adversarial QA pairs, 6,426 entries). Measures whether the correct evidence passage is found in top-K results.\n\n#### Embedding Model Comparison\n\n| Model | Params | Dims | Pooling | R@1 | R@5 | R@10 | Vec p50 |\n|-------|--------|------|---------|-----|-----|------|---------|\n| FTS (BM25, baseline) | — | — | — | 0.2% | 0.2% | 0.2% | 815 ms |\n| Jina v5 text-nano | 239M | 768 | last_token | 8.7% | 19.6% | 25.1% | 25 ms |\n| gte-multilingual-base | 305M | 768 | mean | 13.6% | 31.9% | 42.5% | 25 ms |\n| gte-Qwen2-1.5B-instruct | 1.5B | 1536 | mean | 17.7% | 43.6% | 54.9% | 105 ms |\n| Jina v5 text-small | 677M | 1024 | last_token | 24.9% | 50.7% | 60.4% | 54 ms |\n| GLM Embedding-3 (API) | — | 1024 | — | 22.4% | 53.3% | 64.9% | 146 ms |\n| Qwen3-Embedding-8B (Ollama) | 8B | 1024 | — | 32.9% | 59.5% | 70.7% | 152 ms |\n| Qwen3-Embedding-8B (Ollama) | 8B | 4096 | — | 33.6% | 60.0% | 70.6% | 310 ms |\n| **BAAI\u002Fbge-m3** | **568M** | **1024** | **mean** | **38.6%** | **65.7%** | **75.2%** | **43 ms** |\n\n#### BGE-M3 by Category (default model)\n\n| Category | N | FTS R@10 | Vector R@10 | Improvement |\n|----------|---|----------|-------------|-------------|\n| Single-hop | 841 | 0.1% | **76.1%** | +76.0pp |\n| Multi-hop | 282 | 0.4% | **75.5%** | +75.2pp |\n| Temporal | 321 | 0.0% | **80.4%** | +80.4pp |\n| Open-domain | 96 | 1.0% | **49.0%** | +47.9pp |\n| **Overall** | **1540** | **0.2%** | **75.2%** | **+75.0pp** |\n\n#### Comparison with MemPalace (ChromaDB)\n\n| Metric | Hypatia (BGE-M3) | MemPalace ChromaDB | MemPalace Hybrid v5 |\n|--------|-------------------|--------------------|---------------------|\n| R@10 | 75.2% | 60.3% | 88.9% |\n| Vec latency p50 | 43 ms | ~2-50 ms | ~2-50 ms |\n| Ingest time | ~9 min | — | — |\n\n**Note**: LoCoMo is designed as a semantic challenge, which is why BM25 FTS recall is near zero. Vector search closes this gap effectively. BGE-M3 (568M, 1024d) achieves the highest recall (R@10=75.2%) with good latency (43ms) and strong multilingual coverage (100+ languages) — the best default choice. Qwen3-Embedding-8B (8B, via Ollama) was tested at both 1024d and native 4096d: higher dimensions yield negligible gains (R@10: 70.7% vs 70.6%) at 2x latency (310ms vs 152ms), confirming that 1024d is the optimal trade-off. Even at 8B params, it does not surpass the much smaller BGE-M3. GLM Embedding-3 (Zhipu AI, cloud API) at 1024d underperforms (R@10=64.9%) compared to local models, suggesting API latency overhead doesn't compensate for model quality. Jina v5 text-small (677M) offers mid-range quality (R@10=60.4%) at 54ms. Notably, larger models don't always win: gte-Qwen2 (1.5B) and gte-multilingual-base (305M, ModernBERT) both underperform BGE-M3 on this benchmark.\n\n#### Run the Benchmark\n\n```bash\n# Download test data\ncurl -sL https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FPercena\u002Flocomo-mc10\u002Fresolve\u002Fmain\u002Fraw\u002Flocomo10.json -o locomo10.json\n\n# Run (requires embedding model in ~\u002F.hypatia\u002Fdefault\u002F)\nLOCOMO_DATA=locomo10.json LOCOMO_RESULTS=results.jsonl \\\n  cargo test --test locomo --release -- --nocapture\n```\n\n### LongMemEval Benchmark\n\nTested on [LongMemEval](https:\u002F\u002Fgithub.com\u002Fxiaowu0162\u002Flongmemeval) (500 questions, 7 types, 5 abilities), a comprehensive benchmark for long-term conversational memory.\n\n> **Note**: The test data file (`longmemeval_m.json`, ~2.6 GB) is **not** included in this repository because it is a large third-party public dataset. You need to download it separately before running the benchmark.\n\n#### Download Data\n\n```bash\n# Option 1: Use the download script (recommended)\npip install httpx\npython3 scripts\u002Flongmemeval_download.py              # M variant (default, ~2.6 GB)\npython3 scripts\u002Flongmemeval_download.py --variant s   # S variant (smaller)\n\n# Option 2: Direct download from HuggingFace\ncurl -L -o longmemeval_m.json \\\n  https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fxiaowu0162\u002Flongmemeval-cleaned\u002Fresolve\u002Fmain\u002Flongmemeval_m_cleaned.json\n```\n\nSource: [xiaowu0162\u002Flongmemeval-cleaned on HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fxiaowu0162\u002Flongmemeval-cleaned)\n\n#### Run the Benchmark\n\n```bash\n# Full run (requires embedding model in ~\u002F.hypatia\u002Fdefault\u002F)\nLONGMEMEVAL_DATA=longmemeval_m.json LONGMEMEVAL_RESULTS=longmemeval_m_results.jsonl \\\n  cargo test --test longmemeval --release -- --nocapture\n\n# Quick test with subset\nLONGMEMEVAL_DATA=longmemeval_m.json LONGMEMEVAL_MAX_QUESTIONS=50 LONGMEMEVAL_RESULTS=longmemeval_m_results.jsonl \\\n  cargo test --test longmemeval --release -- --nocapture\n\n# Evaluate results\npython3 scripts\u002Flongmemeval_eval.py --results longmemeval_m_results.jsonl --retrieval-only\n```\n\n### Synthetic Benchmark\n\nBenchmark uses synthetic data with planted needles (known-answer entries) to measure retrieval quality, following MemPalace's methodology.\n\n#### Run\n\n```bash\n# Small scale (1K knowledge, 2K statements, ~12s)\ncargo test --test bench\n\n# With JSON report\nBENCH_REPORT=report.json cargo test --test bench\n\n# Larger scales\nBENCH_SCALE=medium cargo test --test bench --release\nBENCH_SCALE=large cargo test --test bench --release\n```\n\n#### Results (small scale, debug build, Apple Silicon)\n\n1K knowledge, 2K statements, 20 needles, 20 JSE query types (x3 runs each).\n\n| Metric | Result |\n|--------|--------|\n| **Recall@1** | 100.0% (20\u002F20 needles) |\n| **Recall@5** | 100.0% |\n| **Recall@10** | 100.0% |\n| **FTS search p50** | 474 us |\n| **FTS search p99** | 700 us |\n| **JSE query p50** | 3.39 ms |\n| **JSE query count** | 20 types (eq, ne, gt, lt, contains, like, content, search, and, or, not, triple) |\n| **Ingest throughput** | 384 knowledge\u002Fs, 280 statements\u002Fs |\n\nFull report: [docs\u002Fbenchmark-report.md](docs\u002Fbenchmark-report.md)\n\n**Honest assessment**: These results are on synthetic data with known-answer queries -- not academic benchmarks. See [docs\u002Fbenchmark-honest-assessment.md](docs\u002Fbenchmark-honest-assessment.md) for a frank analysis.\n\n## Cross-Compilation\n\n```bash\n# Prerequisites\ncargo install cargo-zigbuild\npip install ziglang\n\n# Build for Linux (musl, static binary)\n.\u002Fscripts\u002Fbuild.sh x86_64-unknown-linux-musl\n\n# Build for all 18 targets\n.\u002Fscripts\u002Fbuild.sh all\n\n# List supported targets\n.\u002Fscripts\u002Fbuild.sh list\n\n# Docker-based cross-compilation (alternative)\ncargo install cross --git https:\u002F\u002Fgithub.com\u002Fcross-rs\u002Fcross\n.\u002Fscripts\u002Fbuild.sh --backend cross x86_64-unknown-linux-musl\n```\n\nSupported targets: x86_64\u002Faarch64\u002Farmv7 Linux (glibc + musl), riscv64, s390x, powerpc64le, macOS, Windows, FreeBSD, NetBSD, illumos, Android.\n\n## License\n\nMIT\n","Hypatia 是一个面向AI的记忆管理系统，它通过将结构化知识存储为图结构来实现高效的信息检索与管理。该项目利用Rust语言构建，核心功能包括基于知识图谱的数据组织方式、自定义的JSON搜索表达式（JSE）查询引擎以及双数据库存储机制（DuckDB和SQLite FTS5），支持本地或远程配置的嵌入模型以进行语义向量搜索。此外，Hypatia 还提供了同义词处理、书架系统等功能，并且具备跨平台特性。此工具适用于需要对大量文本数据进行结构化存储并执行复杂查询的应用场景，如知识库管理、文档检索等。",2,"2026-06-11 02:43:53","CREATED_QUERY"]