[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-6598":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":47,"readmeContent":48,"aiSummary":49,"trendingCount":16,"starSnapshotCount":16,"syncStatus":50,"lastSyncTime":51,"discoverSource":52},6598,"cactus","cactus-compute\u002Fcactus","cactus-compute","Low-latency AI engine for mobile devices & wearables","https:\u002F\u002Fcactuscompute.com",null,"C++",5331,428,42,24,0,33,52,610,99,38.9,"Other",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46],"ai","android","arm","edge","edge-ai","framework","ios","llamacpp","llm","llm-inference","llms","mobile","mobile-inference","on-device-ai","quantiz","rag","smartphone","speech","transformer","whisper","2026-06-12 02:01:27","# Cactus\n\n\u003Cimg src=\"assets\u002Fbanner.jpg\" alt=\"Logo\" style=\"border-radius: 30px; width: 100%;\">\n\n[![Docs][docs-shield]][docs-url]\n[![Website][website-shield]][website-url]\n[![GitHub][github-shield]][github-url]\n[![HuggingFace][hf-shield]][hf-url]\n[![Reddit][reddit-shield]][reddit-url]\n[![Blog][blog-shield]][blog-url]\n\nA low-latency AI engine for mobile devices & wearables. Main features:\n\n- **Fast:** fastest inference on ARM CPU\n- **Low RAM:** zero-copy memory mapping ensures 10x lower RAM use than other engines\n- **Multimodal:** one SDK for speech, vision, and language models\n- **Cloud fallback:** automatically route requests to cloud models if needed\n- **Energy-efficient:** NPU-accelerated prefill\n\n```\n┌─────────────────┐\n│  Cactus Engine  │ ←── OpenAI-compatible APIs for all major languages\n└─────────────────┘     Chat, vision, STT, RAG, tool call, cloud handoff\n         │\n┌─────────────────┐\n│  Cactus Graph   │ ←── Zero-copy computation graph (PyTorch for mobile)\n└─────────────────┘     Custom models, optimised for RAM & quantisation\n         │\n┌─────────────────┐\n│ Cactus Kernels  │ ←── ARM SIMD kernels (Apple, Snapdragon, Exynos, etc)\n└─────────────────┘     Custom attention, KV-cache quant, chunked prefill\n```\n\n## Quick Demo (Mac)\n\n- Step 1: `brew install cactus-compute\u002Fcactus\u002Fcactus`\n- Step 2: `cactus transcribe` or `cactus run` \n\n## Cactus Engine\n\n```cpp\n#include \"cactus.h\"\n\ncactus_model_t model = cactus_init(\n    \"path\u002Fto\u002Fweight\u002Ffolder\",\n    \"path to txt or dir of txts for auto-rag\",\n    false\n);\n\nconst char* messages = R\"([\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n    {\"role\": \"user\", \"content\": \"My name is Henry Ndubuaku\"}\n])\";\n\nconst char* options = R\"({\n    \"max_tokens\": 50,\n    \"stop_sequences\": [\"\u003C|im_end|>\"]\n})\";\n\nchar response[4096];\nint result = cactus_complete(\n    model,            \u002F\u002F model handle\n    messages,         \u002F\u002F JSON chat messages\n    response,         \u002F\u002F response buffer\n    sizeof(response), \u002F\u002F buffer size\n    options,          \u002F\u002F generation options\n    nullptr,          \u002F\u002F tools JSON\n    nullptr,          \u002F\u002F streaming callback\n    nullptr,          \u002F\u002F user data\n    nullptr,          \u002F\u002F pcm audio buffer\n    0                 \u002F\u002F pcm buffer size\n);\n```\nExample response from Gemma3-270m\n```json\n{\n    \"success\": true,        \u002F\u002F generation succeeded\n    \"error\": null,          \u002F\u002F error details if failed\n    \"cloud_handoff\": false, \u002F\u002F true if cloud model used\n    \"response\": \"Hi there!\",\n    \"function_calls\": [],   \u002F\u002F parsed tool calls\n    \"confidence\": 0.8193,   \u002F\u002F model confidence\n    \"time_to_first_token_ms\": 45.23,\n    \"total_time_ms\": 163.67,\n    \"prefill_tps\": 1621.89,\n    \"decode_tps\": 168.42,\n    \"ram_usage_mb\": 245.67,\n    \"prefill_tokens\": 28,\n    \"decode_tokens\": 50,\n    \"total_tokens\": 78\n}\n```\n\n## Cactus Graph\n\n```cpp\n#include \"cactus.h\"\n\nCactusGraph graph;\nauto a = graph.input({2, 3}, Precision::FP16);\nauto b = graph.input({3, 4}, Precision::INT8);\n\nauto x1 = graph.matmul(a, b, false);\nauto x2 = graph.transpose(x1);\nauto result = graph.matmul(b, x2, true);\n\nfloat a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};\nfloat b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};\n\ngraph.set_input(a, a_data, Precision::FP16);\ngraph.set_input(b, b_data, Precision::INT8);\n\ngraph.execute();\nvoid* output_data = graph.get_output(result);\n\ngraph.hard_reset(); \n```\n\n## API & SDK References\n\n| Reference | Language | Description |\n|-----------|----------|-------------|\n| [Engine API](docs\u002Fcactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff |\n| [Graph API](docs\u002Fcactus_graph.md) | C++ | Tensor operations, matrix multiplication, attention, normalization, activation functions |\n| [Python SDK](\u002Fpython\u002F) | Python | Mac, Linux |\n| [Swift SDK](\u002Fapple\u002F) | Swift | iOS, macOS, tvOS, watchOS, Android |\n| [Kotlin SDK](\u002Fandroid\u002F) | Kotlin | Android, iOS (via KMP) |\n| [Flutter SDK](\u002Fflutter\u002F) | Dart | iOS, macOS, Android |\n| [Rust SDK](\u002Frust\u002F) | Rust | Mac, Linux |\n| [React Native](https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fcactus-react-native) | JavaScript | iOS, Android |\n\n> **Model weights:** Pre-converted weights for all supported models at [huggingface.co\u002FCactus-Compute](https:\u002F\u002Fhuggingface.co\u002FCactus-Compute).\n\n## Benchmarks (CPU-only, no GPU)\n\n- All weights INT4 quantised\n- LFM: 1k-prefill \u002F 100-decode, values are prefill tps \u002F decode tps\n- LFM-VL: 256px input, values are latency \u002F decode tps\n- Parakeet: 20s audio input, values are latency \u002F decode tps\n- Missing latency = no NPU support yet\n\n| Device | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | RAM |\n|--------|----------|------------|---------------|-----|\n| Mac M4 Pro | 582\u002F100 | 0.2s\u002F98 | 0.1s\u002F900k+ | 76MB |\n| iPad\u002FMac M3 | 350\u002F60 | 0.3s\u002F69 | 0.3s\u002F800k+ | 70MB |\n| iPhone 17 Pro | 327\u002F48 | 0.3s\u002F48 | 0.3s\u002F300k+ | 108MB |\n| iPhone 13 Mini | 148\u002F34 | 0.3s\u002F35 | 0.7s\u002F90k+ | 1GB |\n| Galaxy S25 Ultra | 255\u002F37 | -\u002F34 | -\u002F250k+ | 1.5GB |\n| Pixel 6a | 70\u002F15 | -\u002F15 | -\u002F17k+ | 1GB |\n| Galaxy A17 5G | 32\u002F10 | -\u002F11 | -\u002F40k+ | 727MB |\n| CMF Phone 2 Pro | - | - | - | - |\n| Raspberry Pi 5 | 69\u002F11 | 13.3s\u002F11 | 4.5s\u002F180k+ | 869MB |\n\n## Supported Transcription Model\n\n- STT: 20s audio input on Macbook Air M3 chip\n- Benchmark dataset: internal evals with production users\n\n| Model | Params | End2End ms | Latency ms | Decode toks\u002Fsec | NPU | RTF | WER |\n|-------|--------|------------|------------|------------|-----|-----|-----|\n| UsefulSensors\u002Fmoonshine-base | 61M | 361.35 | 182 | 262 | yes | 0.0180 | 0.1395 |\n| openai\u002Fwhisper-tiny | 39M | 232.03 | 137.38 | 581 | yes | 0.0116 | 0.1860 |\n| openai\u002Fwhisper-base | 74M | 329.37 | 178.65 | 358 | yes | 0.0164 | 0.1628 |\n| openai\u002Fwhisper-small | 244M | 856.79 | 332.63 | 108 | yes | 0.0428 | 0.0930 |\n| openai\u002Fwhisper-medium | 769M | 2085.87 | 923.33 | 49 | yes | 0.1041 | 0.0930 |\n| openai\u002Fwhisper-large-v3 | 1.55B | 5994 | 2050 | 15.72 | no | 0.2992 | - |\n| nvidia\u002Fparakeet-ctc-0.6b | 600M | 201.77 | 201.44 | 5214285 | yes | 0.0101 | 0.0930 |\n| nvidia\u002Fparakeet-tdt-0.6b-v3 | 600M | 718.91 | 718.82 | 3583333 | yes | 0.0359 | 0.0465 |\n| nvidia\u002Fparakeet-ctc-1.1b | 1.1B | 279.03 | 278.92 | 4562500 | yes | 0.0139 | 0.1628 |\n| snakers4\u002Fsilero-vad | - | - | - | - | - | - | - |\n| pyannote\u002Fsegmentation-3.0 | - | - | - | - | - | - | - |\n| pyannote\u002Fwespeaker-voxceleb-resnet34-LM | - | - | - | - | - | - | - |\n\n## Supported LLMs\n\n- Gemma weights are often **gated** on HuggingFace, needs tokens \n- Run `huggingface-cli login` and input your huggingface token\n\n| Model | Features |                                                      \n|-------|----------|\n| google\u002Fgemma-3-270m-it | completion |\n| google\u002Ffunctiongemma-270m-it | tools |\n| google\u002Fgemma-3-1b-it | completion, gated |\n| google\u002Fgemma-4-E2B-it | completion, tools, embed, vision, speech|\n| google\u002Fgemma-3n-E2B-it | completion, tools |\n| google\u002Fgemma-4-E4B-it | completion, tools, embed, vision, speech|\n| google\u002Fgemma-3n-E4B-it | completion, tools |\n| google\u002Fgemma-4-E2B-it | vision, audio, completion, tools, Apple NPU |\n| google\u002Fgemma-4-E4B-it | vision, audio, completion, tools, Apple NPU |\n| Qwen\u002FQwen3-0.6B | completion, tools, embed | \n| Qwen\u002FQwen3-Embedding-0.6B | embed | \n| Qwen\u002FQwen3.5-0.8B | vision, completion, tools, embed |\n| Qwen\u002FQwen3-1.7B | completion, tools, embed | \n| Qwen\u002FQwen3.5-2B | vision, completion, tools, embed | \n| LiquidAI\u002FLFM2.5-350M | completion, tools, embed |\n| LiquidAI\u002FLFM2-700M | completion, tools, embed |\n| LiquidAI\u002FLFM2-8B-A1B | completion, tools, embed |\n| LiquidAI\u002FLFM2.5-1.2B-Thinking | completion, tools, embed |\n| LiquidAI\u002FLFM2.5-1.2B-Instruct | completion, tools, embed |\n| LiquidAI\u002FLFM2-2.6B | completion, tools, embed |\n| LiquidAI\u002FLFM2-VL-450M | vision, txt & img embed, Apple NPU |\n| LiquidAI\u002FLFM2.5-VL-450M | vision, txt & img embed, Apple NPU |\n| LiquidAI\u002FLFM2.5-VL-1.6B | vision, txt & img embed, Apple NPU |\n| tencent\u002FYoutu-LLM-2B | completion, tools, embed |\n| nomic-ai\u002Fnomic-embed-text-v2-moe | embed |\n\n## Roadmap\n\n| Date | Status | Milestone |\n|------|--------|-----------|\n| Sep 2025 | Done | Released v1 |\n| Oct 2025 | Done | Chunked prefill, KVCache Quant (2x prefill) |\n| Nov 2025 | Done | Cactus Attention (10 & 1k prefill = same decode) |\n| Dec 2025 | Done | Team grows to +6 Research Engineers |\n| Jan 2026 | Done | Apple NPU\u002FRAM, 5-11x faster iOS\u002FMac |\n| Feb 2026 | Done | Hybrid inference, INT4, lossless Quant (1.5x) |\n| Mar 2026 | Coming | Qualcomm\u002FGoogle NPUs, 5-11x faster Android |\n| Apr 2026 | Coming | Mediatek\u002FExynos NPUs, Cactus@ICLR |\n| May 2026 | Coming | Kernel→C++, Graph\u002FEngine→Rust, Mac GPU & VR |\n| Jun 2026 | Coming | Torch\u002FJAX model transpilers |\n| Jul 2026 | Coming | Wearables optimisations, Cactus@ICML |\n| Aug 2026 | Coming | Orchestration |\n| Sep 2026 | Coming | Full Cactus paper, chip manufacturer partners |\n\n## Using this repo\n\n```\n┌──────────────────────────────────────────────────────────────────────────────┐\n│                                                                              │\n│ Step 0: if on Linux (Ubuntu\u002FDebian)                                          │\n│ sudo apt-get install python3 python3-venv python3-pip cmake                  │\n│   build-essential libcurl4-openssl-dev                                       │\n│                                                                              │\n│ Step 1: clone and setup                                                      │\n│ git clone https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fcactus && cd cactus              │\n│ source .\u002Fsetup                                                               │\n│                                                                              │\n│ Step 2: use the commands                                                     │\n│──────────────────────────────────────────────────────────────────────────────│\n│                                                                              │\n│  cactus auth                         manage Cloud API key                    │\n│    --status                          show key status                         │\n│    --clear                           remove saved key                        │\n│                                                                              │\n│  cactus run \u003Cmodel>                  opens playground (auto downloads)       │\n│    --precision INT4|INT8|FP16        quantization (default: INT4)            │\n│    --token \u003Ctoken>                   HF token (gated models)                 │\n│    --reconvert                       force reconversion from source          │\n│                                                                              │\n│  cactus transcribe [model]           live mic transcription (parakeet-tdt-0.6b-v3) │\n│    --file \u003Caudio.wav>                transcribe file instead of mic          │\n│    --precision INT4|INT8|FP16        quantization (default: INT4)            │\n│    --token \u003Ctoken>                   HF token (gated models)                 │\n│    --reconvert                       force reconversion from source          │\n│                                                                              │\n│  cactus download \u003Cmodel>             downloads model to .\u002Fweights            │\n│    --precision INT4|INT8|FP16        quantization (default: INT4)            │\n│    --token \u003Ctoken>                   HuggingFace API token                   │\n│    --reconvert                       force reconversion from source          │\n│                                                                              │\n│  cactus convert \u003Cmodel> [dir]        convert model, supports LoRA merge      │\n│    --precision INT4|INT8|FP16        quantization (default: INT4)            │\n│    --lora \u003Cpath>                     LoRA adapter to merge                   │\n│    --token \u003Ctoken>                   HuggingFace API token                   │\n│                                                                              │\n│  cactus build                        build for ARM → build\u002Flibcactus.a       │\n│    --apple                           Apple (iOS\u002FmacOS)                       │\n│    --android                         Android                                 │\n│    --flutter                         Flutter (all platforms)                 │\n│    --python                          shared lib for Python FFI               │\n│                                                                              │\n│  cactus test                         run unit tests and benchmarks           │\n│    --model \u003Cmodel>                   default: LFM2-VL-450M                   │\n│    --transcribe_model \u003Cmodel>        default: moonshine-base                 │\n│    --benchmark                       use larger models                       │\n│    --precision INT4|INT8|FP16        regenerate weights with precision       │\n│    --reconvert                       force reconversion from source          │\n│    --no-rebuild                      skip building library                   │\n│    --llm \u002F --stt \u002F --performance     run specific test suite                 │\n│    --ios                             run on connected iPhone                 │\n│    --android                         run on connected Android                │\n│                                                                              │\n│  cactus clean                        remove all build artifacts              │\n│  cactus --help                       show all commands and flags             │\n│                                                                              │\n└──────────────────────────────────────────────────────────────────────────────┘\n```\n\n## Maintaining Organisations\n\n1. [Cactus Compute, Inc. (YC S25)](https:\u002F\u002Fcactuscompute.com\u002F)\n2. [UCLA's BruinAI](https:\u002F\u002Fbruinai.org\u002F)\n3. [Char (YC S25)](https:\u002F\u002Fchar.com\u002F)\n4. [Yale's AI Society](https:\u002F\u002Fwww.yale-ai.org\u002Fteam)\n5. [National University of Singapore's AI Society](https:\u002F\u002Fwww.nusaisociety.org\u002F)\n6. [UC Irvine's AI@UCI](https:\u002F\u002Faiclub.ics.uci.edu\u002F)\n7. [Imperial College's AI Society](https:\u002F\u002Fwww.imperialcollegeunion.org\u002Fcsp\u002F1391)\n8. [University of Pennsylvania's AI@Penn](https:\u002F\u002Fai-at-penn-main-105.vercel.app\u002F)\n9. [University of Michigan Ann-Arbor MSAIL](https:\u002F\u002Fmsail.github.io\u002F)\n10. [University of Colorado Boulder's AI Club](https:\u002F\u002Fwww.cuaiclub.org\u002F)\n\n## Citation \n\nIf you use Cactus in your research, please cite it as follows:\n\n```bibtex\n@software{cactus,\n  title        = {Cactus: AI Inference Engine for Phones & Wearables},\n  author       = {Ndubuaku, Henry and Cactus Team},\n  url          = {https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fcactus},\n  year         = {2025}\n}\n```\n\n**N\u002FB:** Scroll all the way up and click the shields link for resources!\n\n[docs-shield]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocs-555?style=for-the-badge&logo=readthedocs&logoColor=white\n[docs-url]: https:\u002F\u002Fcactus-compute.github.io\u002Fcactus\u002F\n\n[website-shield]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-555?style=for-the-badge&logo=safari&logoColor=white\n[website-url]: https:\u002F\u002Fcactuscompute.com\u002F\n\n[github-shield]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-555?style=for-the-badge&logo=github&logoColor=white\n[github-url]: https:\u002F\u002Fgithub.com\u002Fcactus-compute\u002Fcactus\n\n[hf-shield]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-555?style=for-the-badge&logo=huggingface&logoColor=white\n[hf-url]: https:\u002F\u002Fhuggingface.co\u002FCactus-Compute\n\n[reddit-shield]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FReddit-555?style=for-the-badge&logo=reddit&logoColor=white\n[reddit-url]: https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fcactuscompute\u002F\n\n[blog-shield]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-555?style=for-the-badge&logo=hashnode&logoColor=white\n[blog-url]: https:\u002F\u002Fcactuscompute.com\u002Fblog\n","Cactus 是一个专为移动设备和可穿戴设备设计的低延迟AI引擎。其核心功能包括在ARM CPU上实现最快的推理速度、通过零拷贝内存映射技术将RAM使用量降低至其他引擎的十分之一、支持语音、视觉及语言模型的多模态处理，并能在需要时自动切换到云端模型以完成计算任务。此外，它还利用NPU加速预填充过程来提高能效。Cactus特别适合那些对实时性要求高且资源受限的应用场景，如智能手机上的语音识别、图像分析或自然语言处理等任务。",2,"2026-06-11 03:07:51","top_language"]