[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81807":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":14,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":44,"readmeContent":45,"aiSummary":46,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":47,"discoverSource":48},81807,"needle-rs","Geekgineer\u002Fneedle-rs","Geekgineer","258 KB WASM runtime for Needle a 26M-parameter tool-calling transformer. Runs in browser, Cloudflare Workers, and Node.js. No backend required.","https:\u002F\u002Fneedle-rs.pages.dev",null,"Rust",37,5,1,0,2,3,2.33,"Other",false,"main",true,[24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43],"agent","ai","browser-ai","cloudflare-workers","edge-ai","embedded-ai","function-calling","inference-engine","int4","llm","no-std","on-device-ai","quantization","rust","rust-lang","safetensors","tool-calling","transformer","wasm","webassembly","2026-06-12 02:04:19","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fbanner.svg\" alt=\"needle-rs\" width=\"100%\"\u002F>\n\n  \u003Cbr\u002F>\u003Cbr\u002F>\n\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fneedle-rs.pages.dev\">\u003Cb>→ Live demo\u003C\u002Fb>\u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FGeekgineer\u002Fneedle-rs\u002Factions\u002Fworkflows\u002Fci.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FGeekgineer\u002Fneedle-rs\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg?branch=main\" alt=\"CI\"\u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FGeekgineer\u002Fneedle-rs\u002Factions\u002Fworkflows\u002Frelease.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FGeekgineer\u002Fneedle-rs\u002Factions\u002Fworkflows\u002Frelease.yml\u002Fbadge.svg\" alt=\"Release\"\u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FGeekgineer\u002Fneedle-rs\u002Factions\u002Fworkflows\u002Fwasm-demo.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FGeekgineer\u002Fneedle-rs\u002Factions\u002Fworkflows\u002Fwasm-demo.yml\u002Fbadge.svg?branch=main\" alt=\"Demo\"\u002F>\u003C\u002Fa>\n    \u003Ca href=\"#parity\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fparity-560%2F560%20token--exact-brightgreen?style=flat-square\" alt=\"Parity 560\u002F560\"\u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fneedle-infer\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fneedle-infer?style=flat-square&color=CE422B\" alt=\"crates.io\"\u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fneedle-rs\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fv\u002Fneedle-rs?style=flat-square&color=CE422B\" alt=\"npm\"\u002F>\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fneedle-rs\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fneedle-rs?style=flat-square&color=CE422B\" alt=\"PyPI\"\u002F>\u003C\u002Fa>\n    \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue?style=flat-square\" alt=\"MIT\"\u002F>\u003C\u002Fa>\n  \u003C\u002Fp>\n\n  \u003Cp>\n    \u003Ca href=\"#quick-start\">Quick start\u003C\u002Fa> &nbsp;·&nbsp;\n    \u003Ca href=\"#how-it-works\">How it works\u003C\u002Fa> &nbsp;·&nbsp;\n    \u003Ca href=\"#parity\">Parity\u003C\u002Fa> &nbsp;·&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fneedle\">Upstream model\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Cbr\u002F>\n\n\u003Cimg src=\"assets\u002Fdemo.gif\" alt=\"needle-rs demo\" width=\"100%\"\u002F>\n\n\u003Cbr\u002F>\n\nA pure-Rust + WebAssembly runtime for [Needle](https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fneedle) by [Cactus Compute](https:\u002F\u002Fgithub.com\u002Fcactus-compute) — a 26M-parameter transformer that maps `(query, tool list) → JSON call` in one forward pass. Deploys to browsers, edge workers, CLIs, Python, and `no_std` embedded targets. No server, no API key.\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"why-this-matters\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Why_this_matters-CE422B?style=flat-square\" height=\"22\" alt=\"Why this matters\"\u002F>\n\u003C\u002Fh2>\n\nTool calling usually means either a paid API call or hundreds of megabytes on disk. `needle-rs` ships the whole agent in 23 MB.\n\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth align=\"left\">Stack\u003C\u002Fth>\n\u003Cth align=\"right\">Deploy size\u003C\u002Fth>\n\u003Cth align=\"right\">Latency\u003C\u002Fth>\n\u003Cth align=\"right\">Cost\u003C\u002Fth>\n\u003Cth align=\"center\">Privacy\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\n\u003Ctr>\n\u003Ctd>OpenAI function calling\u003C\u002Ftd>\n\u003Ctd align=\"right\">SDK + hosted API\u003C\u002Ftd>\n\u003Ctd align=\"right\">300–800 ms\u003C\u002Ftd>\n\u003Ctd align=\"right\">$ per token\u003C\u002Ftd>\n\u003Ctd align=\"center\">leaves device\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>llama.cpp + 1B local model\u003C\u002Ftd>\n\u003Ctd align=\"right\">700 MB+\u003C\u002Ftd>\n\u003Ctd align=\"right\">varies\u003C\u002Ftd>\n\u003Ctd align=\"right\">free\u003C\u002Ftd>\n\u003Ctd align=\"center\">local\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>ONNX Runtime Web + a model\u003C\u002Ftd>\n\u003Ctd align=\"right\">8 MB + your model\u003C\u002Ftd>\n\u003Ctd align=\"right\">varies\u003C\u002Ftd>\n\u003Ctd align=\"right\">free\u003C\u002Ftd>\n\u003Ctd align=\"center\">local\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cb>\u003Ccode>needle-rs\u003C\u002Fcode> + Needle\u003C\u002Fb>\u003C\u002Ftd>\n\u003Ctd align=\"right\">\u003Cb>258 KB + 22 MB\u003C\u002Fb>\u003C\u002Ftd>\n\u003Ctd align=\"right\">\u003Cb>~280 ms\u003C\u002Fb>\u003C\u002Ftd>\n\u003Ctd align=\"right\">\u003Cb>free\u003C\u002Fb>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Cb>local\u003C\u002Fb>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\n\u003C\u002Ftable>\n\nThe same routing accuracy — at a fraction of the footprint.\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"quick-start\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Quick_start-CE422B?style=flat-square\" height=\"22\" alt=\"Quick start\"\u002F>\n\u003C\u002Fh2>\n\n> **`needle-rs` is a community Rust + WASM runtime for [Needle](https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fneedle) by [Cactus Compute](https:\u002F\u002Fgithub.com\u002Fcactus-compute).** The model architecture, training procedure, dataset, and weights are entirely their work, released under MIT. This project provides a deployment layer for contexts the official Python implementation cannot reach: browsers, edge workers, embedded systems, and binary-distribution use cases. If you build with this, you are building on Cactus's Needle — please credit them.\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Browser \u002F Node.js\u003C\u002Fb> &nbsp;—&nbsp; \u003Ccode>npm install needle-rs\u003C\u002Fcode>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\n```js\nimport init, { NeedleWasm } from \"needle-rs\";\n\nawait init();\nconst engine = NeedleWasm.load(weights, vocab);\nengine.run(\"Book a flight from London to JFK tomorrow\", toolsJson);\n\u002F\u002F → {\"name\":\"book_flight\",\"arguments\":{...}}\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Rust\u003C\u002Fb> &nbsp;—&nbsp; \u003Ccode>cargo add needle-infer\u003C\u002Fcode>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\n```rust\nuse needle_infer::NeedleEngine;\n\nlet engine = NeedleEngine::load(\"needle.safetensors\", \"vocab.txt\")?;\nlet result = engine.run(query, tools_json);\nprintln!(\"{}\", result.text);\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Python\u003C\u002Fb> &nbsp;—&nbsp; \u003Ccode>pip install needle-rs\u003C\u002Fcode>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\n```python\nfrom needle_rs import NeedleEngine\n\nengine = NeedleEngine.load(\"needle.safetensors\", \"vocab.txt\")\nresult = engine.run(query, tools_json)\n# → [{\"name\":\"book_flight\",\"arguments\":{...}}]\n```\n\u003C\u002Fdetails>\n\n**Get the weights**\n\n```bash\nhuggingface-cli download Abdalrahman\u002Fneedle-rs-safetensors \\\n  needle.safetensors vocab.txt --local-dir weights\u002F\n```\n\nOr load directly from a URL in the browser — no install step.\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"where-it-runs\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Where_it_runs-CE422B?style=flat-square\" height=\"22\" alt=\"Where it runs\"\u002F>\n\u003C\u002Fh2>\n\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth align=\"left\">Target\u003C\u002Fth>\n\u003Cth align=\"center\">Status\u003C\u002Fth>\n\u003Cth align=\"right\">Binary\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\n\u003Ctr>\u003Ctd>Browser \u003Csub>(WASM)\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd align=\"center\">✓\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Ccode>258 KB\u003C\u002Fcode>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Node.js \u003Csub>(WASM)\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd align=\"center\">✓\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Ccode>258 KB\u003C\u002Fcode>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Cloudflare Workers\u003C\u002Ftd>\u003Ctd align=\"center\">✓\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Ccode>258 KB\u003C\u002Fcode>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Linux \u002F macOS \u002F Windows CLI\u003C\u002Ftd>\u003Ctd align=\"center\">✓\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Ccode>533 KB\u003C\u002Fcode>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Python \u003Csub>(native wheel)\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd align=\"center\">✓\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Ccode>pip install needle-rs\u003C\u002Fcode>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>C \u002F C++ \u002F Go \u002F Swift \u003Csub>(via FFI)\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd align=\"center\">✓\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Ccode>557 KB\u003C\u002Fcode>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>\u003Ccode>no_std\u003C\u002Fcode> embedded (Rust)\u003C\u002Ftd>\u003Ctd align=\"center\">✓\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Csub>size varies\u003C\u002Fsub>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>iOS \u002F Android \u003Csub>(use \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fcactus\">Cactus\u003C\u002Fa>)\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd align=\"center\">\u003Csub>—\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Csub>—\u003C\u002Fsub>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Apple NPU \u002F Snapdragon NPU \u003Csub>(use \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fcactus\">Cactus\u003C\u002Fa>)\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd align=\"center\">\u003Csub>—\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Csub>—\u003C\u002Fsub>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003C\u002Ftbody>\n\u003C\u002Ftable>\n\nCactus's official engine targets mobile and NPUs with hand-tuned ARM SIMD. `needle-rs` targets everywhere else. The two stacks are complementary.\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"how-it-works\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-How_it_works-CE422B?style=flat-square\" height=\"22\" alt=\"How it works\"\u002F>\n\u003C\u002Fh2>\n\nNeedle is a 26M-parameter encoder-decoder transformer with a small twist: it's trained to do exactly one thing — emit a function-call JSON object from a query and a tool list. That focus is why a model this small works at all.\n\n\u003Ctable>\n\u003Ctr>\n\u003Ctd width=\"32\" valign=\"top\" align=\"center\">\u003Csub>1\u003C\u002Fsub>\u003C\u002Ftd>\n\u003Ctd valign=\"top\">\u003Cb>Encoder–decoder SAN.\u003C\u002Fb> The encoder reads the query and tool definitions once. The decoder generates output JSON token by token, attending to the encoder's cached KV. Single forward pass per call.\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd valign=\"top\" align=\"center\">\u003Csub>2\u003C\u002Fsub>\u003C\u002Ftd>\n\u003Ctd valign=\"top\">\u003Cb>INT4 quantization.\u003C\u002Fb> All attention and FFN weights are packed 4-bit nibbles with per-32-row scales. Matvec dequantizes on the fly — the full f32 weight matrix is never materialized. AVX2 on x86_64, NEON on aarch64, scalar fallback for WASM.\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd valign=\"top\" align=\"center\">\u003Csub>3\u003C\u002Fsub>\u003C\u002Ftd>\n\u003Ctd valign=\"top\">\u003Cb>Constrained decoding.\u003C\u002Fb> A character-level trie over valid tool names and argument keys, plus a three-state JSON machine, masks logits at every step. Output is always syntactically valid JSON pointing at a real tool — never a hallucinated function name, never broken syntax.\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd valign=\"top\" align=\"center\">\u003Csub>4\u003C\u002Fsub>\u003C\u002Ftd>\n\u003Ctd valign=\"top\">\u003Cb>Two schema formats.\u003C\u002Fb> Accepts both the flat \u003Ccode>{\"location\": {\"type\": \"string\"}}\u003C\u002Fcode> style and OpenAI's \u003Ccode>{\"type\":\"object\",\"properties\":{...}}\u003C\u002Fcode> style. The Python reference handles only the flat form.\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd valign=\"top\" align=\"center\">\u003Csub>5\u003C\u002Fsub>\u003C\u002Ftd>\n\u003Ctd valign=\"top\">\u003Cb>Greedy by design.\u003C\u002Fb> Tool calling is a routing task, not a generation task — temperature would only introduce errors. \u003Ccode>needle-rs\u003C\u002Fcode> is argmax-only and intentionally does not support stochastic sampling.\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\nArchitecture deep-dive: [ARCHITECTURE.md](ARCHITECTURE.md).\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"parity\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Parity-7EE787?style=flat-square\" height=\"22\" alt=\"Parity\"\u002F>\n  &nbsp;\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F560%2F560-token--exact-7EE787?style=flat-square\" height=\"22\" alt=\"560\u002F560 token-exact\"\u002F>\n\u003C\u002Fh2>\n\nA common failure mode for from-scratch model reimplementations is silent drift — outputs that look right but diverge at the third decimal place, with rare and untraceable downstream bugs. `needle-rs` rejects that. The Rust engine is required to produce the **exact same token ID sequence** as the Python\u002FJAX reference on every input, at every decode step.\n\nThe test suite generates 560 inference examples by running the Python model on a diverse input set: five tool-name conventions \u003Csub>(snake_case, camelCase, PascalCase, UPPER_SNAKE_CASE, kebab-case)\u003C\u002Fsub>, parameter counts from 0 to 8, tool arrays from 1 to 20 entries, and a spread of natural-language query phrasings. For each example we capture the Python model's complete output token sequence. The Rust engine is then run on every example and required to produce the identical sequence.\n\n> **560 \u002F 560 pass.** Not approximately — same `argmax` decision at every step.\n\nToken-exact parity is checked on every CI run. Any change that drifts gets caught before merge. The reference vectors are committed to the repo, so the parity contract is version-pinned and reproducible without re-running Python:\n\n- Reference generator: [`tools\u002Fgen_e2e_vectors.py`](tools\u002Fgen_e2e_vectors.py)\n- Reference data: [`tests\u002Fe2e_vectors.json`](tests\u002Fe2e_vectors.json)\n- Rust parity test: [`crates\u002Fneedle-infer\u002Ftests\u002Fe2e_parity.rs`](crates\u002Fneedle-infer\u002Ftests\u002Fe2e_parity.rs)\n\nPlus 55 unit tests on the constrained decoder covering edge cases the parity suite doesn't reach \u003Csub>(empty tool arrays, parameter-less tools, name-collision under snake_case normalization, max-length queries)\u003C\u002Fsub>.\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"api\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-API-CE422B?style=flat-square\" height=\"22\" alt=\"API\"\u002F>\n\u003C\u002Fh2>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>JavaScript \u002F TypeScript\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\n```js\nengine.run(query, tools)                              \u002F\u002F → string\nengine.run_stream(query, tools, (id, piece) => {})    \u002F\u002F per-token callback → final string\nengine.run_batch([{ query, tools }, ...])             \u002F\u002F → string[]\nengine.encode_contrastive(text)                       \u002F\u002F → Float32Array | null\nengine.retrieve_tools(query, descriptionsJson, topK)  \u002F\u002F semantic tool routing\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Rust\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\n```rust\nengine.run(query, tools_json);\nengine.run_stream(query, tools_json, |_id, piece| print!(\"{piece}\"));\nengine.run_batch(&[(q1, t1), (q2, t2)]);\nengine.encode_contrastive(text);            \u002F\u002F → Option\u003CVec\u003Cf32>>\nengine.retrieve_tools(query, descs, k);     \u002F\u002F → Vec\u003C(usize, f32)>\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>C (and anything with FFI)\u003C\u002Fb>\u003C\u002Fsummary>\n\u003Cbr\u002F>\n\n```c\n#include \"needle.h\"\n\nNeedleHandle h  = needle_load(\"needle.safetensors\", \"vocab.txt\");\nconst char *out = needle_run(h, query, tools_json);\nprintf(\"%s\\n\", out);\nneedle_free_str((char *)out);\nneedle_free(h);\n```\n\nFull header: [`crates\u002Fneedle-c\u002Finclude\u002Fneedle.h`](crates\u002Fneedle-c\u002Finclude\u002Fneedle.h). Null-safe throughout; errors via thread-local `needle_last_error()`.\n\u003C\u002Fdetails>\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"benchmarks\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Benchmarks-CE422B?style=flat-square\" height=\"22\" alt=\"Benchmarks\"\u002F>\n\u003C\u002Fh2>\n\nIntel i7-1185G7 (Tiger Lake, 4-core), Linux, release build, median of 5 runs.\n\n\u003Ctable>\n\u003Ctr>\u003Ctd>End-to-end (load + infer)\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>283 ms\u003C\u002Fcode>\u003C\u002Fb>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Warm inference only\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>~80 ms\u003C\u002Fcode>\u003C\u002Fb>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>INT4 matvec 512×512 (AVX2)\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>83 µs · 3.2 Gelem\u002Fs\u003C\u002Fcode>\u003C\u002Fb>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>INT4 matvec 2048×512 (AVX2)\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>311 µs · 3.1 Gelem\u002Fs\u003C\u002Fcode>\u003C\u002Fb>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003C\u002Ftable>\n\nApple Silicon NEON path is implemented but unbenchmarked — M-series numbers welcome via PR.\n\n**Footprint, stripped release:**\n\n\u003Ctable>\n\u003Ctr>\u003Ctd>WASM module\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>258 KB\u003C\u002Fcode>\u003C\u002Fb>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>CLI binary\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>533 KB\u003C\u002Fcode>\u003C\u002Fb>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>C shared library\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>557 KB\u003C\u002Fcode>\u003C\u002Fb>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Weights (INT4 SafeTensors)\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>22 MB\u003C\u002Fcode>\u003C\u002Fb>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003Ctr>\u003Ctd>Runtime dependencies\u003C\u002Ftd>\u003Ctd align=\"right\">\u003Cb>\u003Ccode>1\u003C\u002Fcode>\u003C\u002Fb> \u003Csub>(libm; WASM adds wasm-bindgen)\u003C\u002Fsub>\u003C\u002Ftd>\u003C\u002Ftr>\n\u003C\u002Ftable>\n\nFull methodology and raw numbers: [BENCHMARKS.md](BENCHMARKS.md).\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"use-cases\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-What_it's_good_for-CE422B?style=flat-square\" height=\"22\" alt=\"What it's good for\"\u002F>\n\u003C\u002Fh2>\n\n\u003Ctable>\n\u003Ctr>\n\u003Ctd valign=\"top\" width=\"50%\">\n\n**✓ Browser-side intent routing**\nDecide which API to call before making the network request. Sub-second, zero servers.\n\n**✓ Edge function dispatch**\nTool calling inside Cloudflare Workers, Vercel Edge, Deno Deploy — anywhere with a WASM runtime.\n\n\u003C\u002Ftd>\n\u003Ctd valign=\"top\" width=\"50%\">\n\n**✓ On-device privacy**\nUser queries never leave the browser tab. Useful for healthcare, legal, and any context where sending text to OpenAI is a non-starter.\n\n**✓ Embedded agents**\n`no_std` core means the kernels run on microcontrollers with enough RAM for the weights.\n\n\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\nWhat it's *not* good for: open-ended chat, long-context reasoning, anything where you'd reach for a >1B-parameter model. Needle is a router, not a generalist.\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"acknowledgements\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Acknowledgements-7D8590?style=flat-square\" height=\"22\" alt=\"Acknowledgements\"\u002F>\n\u003C\u002Fh2>\n\nNeedle is designed and trained by [Henry Ndubuaku](https:\u002F\u002Fgithub.com\u002Fhndubuaku) and the [Cactus Compute](https:\u002F\u002Fgithub.com\u002Fcactus-compute) team. The model architecture, training code, dataset, and weights are entirely their work, released under MIT. `needle-rs` is an independent Rust runtime — no upstream code is copied, only the published architecture is implemented.\n\n**If you find this useful, please star the [upstream Needle repo](https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fneedle) as well.**\n\n\u003Cbr\u002F>\n\n\u003C!-- ────────────────────────────────────────────────── -->\n\u003Ch2 id=\"citation\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Citation-7D8590?style=flat-square\" height=\"22\" alt=\"Citation\"\u002F>\n\u003C\u002Fh2>\n\n```bibtex\n@software{needle2026,\n  author  = {Ndubuaku, Henry and {Cactus Compute}},\n  title   = {Needle: A 26M-Parameter Tool-Calling Transformer},\n  year    = {2026},\n  url     = {https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fneedle},\n  license = {MIT}\n}\n\n@software{needlers2026,\n  author  = {Ibrahim, Abdalrahman},\n  title   = {needle-rs: Pure-Rust WASM Runtime for Needle},\n  year    = {2026},\n  url     = {https:\u002F\u002Fgithub.com\u002Fgeekgineer\u002Fneedle-rs},\n  license = {MIT}\n}\n```\n\n\u003Cbr\u002F>\n\n\u003Cdiv align=\"center\">\n  \u003Csub>MIT — see \u003Ca href=\"LICENSE\">LICENSE\u003C\u002Fa>. Model and weights by \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcactus-compute\">Cactus Compute\u003C\u002Fa>, also MIT.\u003C\u002Fsub>\n\u003C\u002Fdiv>","needle-rs 是一个基于 Rust 和 WebAssembly 的运行时环境，用于在浏览器、Cloudflare Workers 和 Node.js 中执行 Needle，这是一个具有2600万参数的工具调用转换器。其核心功能包括将查询和工具列表映射为JSON调用，并且能够在多种环境中部署，如边缘计算、命令行界面以及Python等，无需后端支持或API密钥。得益于量化技术，整个模型仅需23MB存储空间，适用于对隐私有高要求且希望减少网络延迟与成本的应用场景，比如嵌入式设备上的AI应用或是需要快速响应的前端服务。","2026-06-11 04:06:48","CREATED_QUERY"]