[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73440":3},{"id":4,"name":5,"fullName":6,"owner":5,"repo":5,"description":7,"homepage":8,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":15,"starSnapshotCount":15,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},73440,"zml","zml\u002Fzml","Any model. Any hardware. Zero compromise. Built with @ziglang \u002F @openxla \u002F MLIR \u002F @bazelbuild","https:\u002F\u002Fdocs.zml.ai",null,"Zig",3623,147,30,20,0,34,53,122,102,28.51,"Apache License 2.0",false,"master",[25,26,27,28,29,30],"ai","bazel","hpc","inference","xla","zig","2026-06-12 02:03:13","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fzml\u002Fzml.github.io\u002Frefs\u002Fheads\u002Fmain\u002Fdocs-assets\u002Fzml-banner.png\" style=\"width:100%; height:120px;\">\n  \u003Ca href=\"https:\u002F\u002Fzml.ai\">Website\u003C\u002Fa>\n  | \u003Ca href=\"#getting-started\">Getting Started\u003C\u002Fa>\n  | \u003Ca href=\".\u002Fdocs\u002FREADME.md\">Documentation\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002F6y72SN2E7H\">Discord\u003C\u002Fa>\n  | \u003Ca href=\".\u002FCONTRIBUTING.md\">Contributing\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n[ZML]: https:\u002F\u002Fzml.ai\u002F\n[Getting Started]: #getting-started\n[Documentation]: .\u002Fdocs\u002FREADME.md\n[Contributing]: .\u002FCONTRIBUTING.md\n[Discord]: https:\u002F\u002Fdiscord.gg\u002F6y72SN2E7H\n\n# About\n\nZML is a production inference stack, purpose-built to decouple AI workloads from proprietary hardware.\n\nAny model, many hardwares, one codebase, peak performance.\n\nCompiled directly to NVIDIA, AMD, TPU, Trainium for peak hardware performance on any accelerator. No rewriting.\n\nIt is built using the\n[Zig](https:\u002F\u002Fziglang.org) language, [MLIR](https:\u002F\u002Fmlir.llvm.org), and [Bazel](https:\u002F\u002Fbazel.build).\n\n# Getting Started\n\n## Prerequisites\n\nWe use `bazel` to build ZML and its dependencies. The only prerequisite is\n`bazel`, which we recommend installing through `bazelisk`.\n\n### macOS\n\n```bash\nbrew install bazelisk\n```\n\n### Linux\n\n```bash\ncurl -L -o \u002Fusr\u002Flocal\u002Fbin\u002Fbazel 'https:\u002F\u002Fgithub.com\u002Fbazelbuild\u002Fbazelisk\u002Freleases\u002Fdownload\u002Fv1.28.0\u002Fbazelisk-linux-amd64'\nchmod +x \u002Fusr\u002Flocal\u002Fbin\u002Fbazel\n```\n\n## 30-Second Smoke Test\n\nRun the MNIST example:\n\n```bash\nbazel run \u002F\u002Fexamples\u002Fmnist\n```\n\nThis downloads a small pretrained MNIST model, compiles it, loads the weights, and\nclassifies a random handwritten digit.\n\n## LLM Quickstart\n\nThe main LLM example is [`\u002F\u002Fexamples\u002Fllm`](.\u002Fexamples\u002Fllm). It currently supports:\n\n- Llama 3.1 \u002F 3.2\n- Qwen 3.5\n- LFM 2.5\n\nAuthenticate with Hugging Face if you want to load gated repos such as Meta\nLlama:\n\n```bash\nbazel run \u002F\u002Ftools\u002Fhf -- auth login\n```\n\nAlternatively, set the `HF_TOKEN` environment variable.\n\nThen run a prompt directly:\n\n```bash\nbazel run \u002F\u002Fexamples\u002Fllm -- --model=hf:\u002F\u002Fmeta-llama\u002FLlama-3.2-1B-Instruct --prompt=\"What is the capital of France?\"\n```\n\nOpen the interactive chat loop by omitting `--prompt`:\n\n```bash\nbazel run \u002F\u002Fexamples\u002Fllm -- --model=hf:\u002F\u002Fmeta-llama\u002FLlama-3.2-1B-Instruct\n```\n\nYou can also load from:\n\n- a local directory: `--model=\u002Fvar\u002Fmodels\u002Fmeta-llama\u002FLlama-3.2-1B-Instruct`\n- S3: `--model=s3:\u002F\u002Fbucket\u002Fpath\u002Fto\u002Fmodel`\n\n## Running Models on GPU \u002F TPU\n\nAppend one or more platform flags when compiling or running:\n\n- NVIDIA CUDA: `--@zml\u002F\u002Fplatforms:cuda=true`\n- AMD RoCM: `--@zml\u002F\u002Fplatforms:rocm=true`\n- Google TPU: `--@zml\u002F\u002Fplatforms:tpu=true`\n- AWS Trainium \u002F Inferentia 2: `--@zml\u002F\u002Fplatforms:neuron=true`\n- Disable CPU compilation: `--@zml\u002F\u002Fplatforms:cpu=false`\n\nExample on CUDA:\n\n```bash\nbazel run \u002F\u002Fexamples\u002Fllm --@zml\u002F\u002Fplatforms:cuda=true -- --model=hf:\u002F\u002Fmeta-llama\u002FLlama-3.2-1B-Instruct --prompt=\"Write a haiku about Zig\"\n```\n\nExample on ROCm:\n\n```bash\nbazel run \u002F\u002Fexamples\u002Fllm --@zml\u002F\u002Fplatforms:rocm=true -- --model=hf:\u002F\u002Fmeta-llama\u002FLlama-3.2-1B-Instruct --prompt=\"Write a haiku about Zig\"\n```\n\n## Run Tests\n\n```bash\nbazel test \u002F\u002Fzml:test\n```\n\n# Examples\n\n- [`examples\u002Fllm`](.\u002Fexamples\u002Fllm): unified LLM CLI for Llama, Qwen, and LFM\n- [`examples\u002Fmnist`](.\u002Fexamples\u002Fmnist): smallest end-to-end model run\n- [`examples\u002Fsharding`](.\u002Fexamples\u002Fsharding): logical mesh, partitioners, shard-local execution, profiler output\n- [`examples\u002Fio`](.\u002Fexamples\u002Fio): inspect and load local, `hf:\u002F\u002F`, `https:\u002F\u002F`, and `s3:\u002F\u002F` repositories through the VFS layer\n- [`examples\u002Fbenchmark`](.\u002Fexamples\u002Fbenchmark): measure loading and execution performance\n\n# A Taste Of ZML\n\n```zig\nconst Mnist = struct {\n    fc1: Layer,\n    fc2: Layer,\n\n    const Layer = struct {\n        weight: zml.Tensor,\n        bias: zml.Tensor,\n\n        pub fn init(store: zml.io.TensorStore.View) Layer {\n            return .{\n                .weight = store.createTensor(\"weight\", .{ .d_out, .d }, null),\n                .bias = store.createTensor(\"bias\", .{.d_out}, null),\n            };\n        }\n\n        pub fn forward(self: Layer, input: zml.Tensor) zml.Tensor {\n            return self.weight.dot(input, .d).add(self.bias).relu().withTags(.{.d});\n        }\n    };\n\n    pub fn init(store: zml.io.TensorStore.View) Mnist {\n        return .{\n            .fc1 = .init(store.withPrefix(\"fc1\")),\n            .fc2 = .init(store.withPrefix(\"fc2\")),\n        };\n    }\n\n    pub fn load(\n        self: *const Mnist,\n        allocator: std.mem.Allocator,\n        io: std.Io,\n        platform: *const zml.Platform,\n        store: *const zml.io.TensorStore,\n        shardings: []const zml.Sharding,\n    ) !zml.Bufferized(Mnist) {\n        return zml.io.load(Mnist, self, allocator, io, platform, store, .{\n            .shardings = shardings,\n            .parallelism = 1,\n            .dma_chunks = 1,\n            .dma_chunk_size = 16 * 1024 * 1024,\n        });\n    }\n\n    pub fn unloadBuffers(self: *zml.Bufferized(Mnist)) void {\n        self.fc1.weight.deinit();\n        self.fc1.bias.deinit();\n        self.fc2.weight.deinit();\n        self.fc2.bias.deinit();\n    }\n\n    \u002F\u002F\u002F just two linear layers + relu activation\n    pub fn forward(self: Mnist, input: zml.Tensor) zml.Tensor {\n        var x = input.flatten().convert(.f32).withTags(.{.d});\n        const layers: []const Layer = &.{ self.fc1, self.fc2 };\n        for (layers) |layer| {\n            x = layer.forward(x);\n        }\n        return x.argMax(0).indices.convert(.u8);\n    }\n};\n```\n\nFor a full walkthrough, see:\n\n- [Getting Started](.\u002Fdocs\u002Ftutorials\u002Fgetting_started.md)\n- [Writing your first model](.\u002Fdocs\u002Ftutorials\u002Fwrite_first_model.md)\n- [ZML Concepts](.\u002Fdocs\u002Flearn\u002Fconcepts.md)\n- [Deploying on a server](.\u002Fdocs\u002Fhowtos\u002Fdeploy_on_server.md)\n\n# Where To Go Next\n\n- Run more examples in [`.\u002Fexamples`](.\u002Fexamples)\n- Read the example-specific notes in [`examples\u002Fllm\u002FREADME.md`](.\u002Fexamples\u002Fllm\u002FREADME.md)\n- Learn tagged dimensions in [`working_with_tensors.md`](.\u002Fdocs\u002Ftutorials\u002Fworking_with_tensors.md)\n- Start building a model with [`write_first_model.md`](.\u002Fdocs\u002Ftutorials\u002Fwrite_first_model.md)\n- Explore deployment in [`deploy_on_server.md`](.\u002Fdocs\u002Fhowtos\u002Fdeploy_on_server.md)\n\n# Contributing\n\nSee [here][Contributing].\n\n# License\n\nZML is licensed under the [Apache 2.0 license](.\u002FLICENSE).\n\n# Thanks To Our Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fzml\u002Fzml\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Fcontrib.rocks\u002Fimage?repo=zml\u002Fzml\" \u002F>\n\u003C\u002Fa>\n","ZML 是一个生产级推理栈，旨在将AI工作负载与专有硬件解耦。其核心功能包括支持任意模型在多种硬件（如NVIDIA、AMD、TPU和Trainium）上运行，无需重写代码即可实现峰值性能。ZML 采用 Zig 语言、MLIR 和 Bazel 构建，确保了跨平台的高性能和灵活性。适用于需要高效利用不同硬件加速器进行AI推理的场景，特别是对性能要求高且希望避免厂商锁定的企业和研究机构。",2,"2026-06-11 03:45:32","high_star"]