[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-161":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":13,"stars30d":13,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":13,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":15,"fork":15,"defaultBranch":16,"hasWiki":17,"hasPages":15,"topics":18,"createdAt":10,"pushedAt":10,"updatedAt":35,"readmeContent":36,"aiSummary":37,"trendingCount":13,"starSnapshotCount":13,"syncStatus":38,"lastSyncTime":39,"discoverSource":40},161,"cluster-performance-engine","marchinthesun\u002Fcluster-performance-engine","marchinthesun","Cloud-native Kubernetes performance optimizer for high-core bare-metal clusters. A NUMA-aware scheduler for HPC, ML inference, and CI\u002FCD that cuts latency on 128+ core EPYC\u002FThreadripper nodes.","",null,"Go",105,0,104,false,"main",true,[19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34],"amd-epyc","bare-metal","dag","hardware-topology","high-performance-computing","hpc","hpc-clusters","hpc-systems","hugepages","k3s","k8s","kubernetes","kubernetes-cluster","linux-performance","numa","slurm","2026-06-12 02:00:09","\u003C!--\n  SEO \u002F discovery: GitHub About, package registries, and search engines index the first\n  heading + first paragraphs. Keep the elevator pitch below dense with honest keywords.\n-->\n\n# NexusFlow\n\n**NexusFlow** is a **NUMA-aware**, **low-overhead** task orchestration stack for **Linux** bare-metal and large multi-socket nodes—an **“autopilot layer”** between your jobs and the hardware: it maps **topology** (sysfs, optional **hwloc**), pins CPUs (`sched_setaffinity` \u002F `taskset`), optionally binds **memory** to the same NUMA node (`numactl` when available), runs **YAML DAG** pipelines with **Prometheus** timings, coordinates **dynamic graphs** over **Unix sockets** with **POSIX shared memory** (`\u002Fdev\u002Fshm`, `mmap(MAP_SHARED)`), and passes **perf** \u002F **shm** fds via **SCM_RIGHTS**. It does **not** replace kube-scheduler or Slurm; it makes single-node (and per-step) execution **locality-aware**.\n\n> **The why:** On 100-node, 128-thread fleets, electricity and cooling dominate TCO. Scheduler churn, cross-NUMA hops, and cold L3 misses turn paid cores into **latency** and **idle tail**. NexusFlow pushes those cycles back into useful work—**pinning**, **local memory**, and **predictable DAG** execution—so platform owners can reason in **milliseconds of tail latency** and **points of CPU utilization**, not only in “pods scheduled.”\n\n**Stack today:** Go 1.22 (`cmd\u002Fnexusflow`, `pkg\u002F*`), **gRPC daemon** with cgroup v2 cells (Linux), optional Python SDK (`sdk\u002Fpython`). See [`docs\u002FPRODUCT-VISION.md`](docs\u002FPRODUCT-VISION.md).\n\n**In one sentence:** NexusFlow is an **autopilot for single-node (and per-rank) Linux execution** on large NUMA machines—the OS treats cores like one big pool; NexusFlow treats them like **neighborhoods** (NUMA nodes), **local DRAM**, and **last-level cache**, so jobs spend less time waiting on **remote memory** and scheduler migration.\n\n---\n\n## Copy-paste examples\n\nThese are the commands operators and the **web dashboard** (`nexusflow dashboard` → `\u002Fapi\u002Fexec`) actually exercise day to day.\n\n```bash\n# Topology as structured JSON (feeds dashboard tables & NUMA chart)\nnexusflow topology --json --source auto\n\n# Human-readable tree + sysfs \u002F hwloc\nnexusflow topology --source sysfs\nnexusflow topology matrix --source auto\n\n# Shell exports for Slurm \u002F wrappers\nnexusflow topology hints --format shell --source auto\n\n# Pin a workload to CPUs with optional same-node memory bind (numactl when installed)\nnexusflow run --cpus 16 --numa 0 --priority normal --membind=true -- make -j16\n\n# DAG pipeline + Prometheus text metrics\nnexusflow dag run --file examples\u002Fpipeline.yaml --prom-file \u002Ftmp\u002Fnf-dag.prom\n\n# POSIX shared memory segment\nnexusflow shm create --size 1048576\n\n# Short perf sample (needs perf rights on the host)\nnexusflow perf sample --sleep-ms 100 --kind cycles\n\n# Plasma coordinator (Unix socket + \u002Fdev\u002Fshm); long runs need a higher dashboard timeout_sec\nnexusflow plasma run --file examples\u002Fplasma\u002Fplasma.yaml --listen \u002Frun\u002Fplasma.sock --shm-name demo\n\n# Privileged: hugepages pool (Linux)\nnexusflow hugepages set --size 2M --pages 128\n# or: nexusflow hugepages set --size 1G --count 4\n\n# gRPC daemon: cgroup v2 cells, LLC stream, eviction, hugepages RPC (Linux)\nnexusflow daemon --listen 127.0.0.1:50051\n\n# Dashboard (TLS + Bearer + CIDR recommended off loopback)\nnexusflow dashboard --listen 127.0.0.1:9842\n```\n\n---\n\n## Dashboard (UI)\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"nexusflow\u002Fimage\u002Fui-demo-1.png\" alt=\"Output\" width=\"47%\" \u002F> &nbsp;\n\u003Cimg src=\"nexusflow\u002Fimage\u002Fui-demo-2.png\" alt=\"Topology\" width=\"47%\" \u002F>\u003Cbr \u002F>\u003Cbr \u002F>\n\u003Cimg src=\"nexusflow\u002Fimage\u002Fui-demo-3.png\" alt=\"Control\" width=\"47%\" \u002F> &nbsp;\n\u003Cimg src=\"nexusflow\u002Fimage\u002Fui-demo-4.png\" alt=\"Guide\" width=\"47%\" \u002F>\n\u003C\u002Fp>\n\n---\n\n## Quick start\n\n| Step | Command \u002F link |\n|------|----------------|\n| **Git** | `git clone https:\u002F\u002Fgithub.com\u002Fmarchinthesun\u002Fcluster-performance-engine.git`|\n| **Directory cd** | `cd cluster-performance-engine`|\n| **One-shot build** | `chmod +x install.sh` `.\u002Finstall.sh`|\n| **Deep dive CLI & packages** | [`nexusflow\u002FREADME.md`](nexusflow\u002FREADME.md) |\n| **Python SDK** | [`nexusflow\u002Fsdk\u002Fpython\u002FREADME.md`](nexusflow\u002Fsdk\u002Fpython\u002FREADME.md) |\n| **Slurm \u002F hints** | [`nexusflow\u002Fexamples\u002Fslurm\u002FREADME.md`](nexusflow\u002Fexamples\u002Fslurm\u002FREADME.md) |\n| **Architecture (full)** | [`ARCHITECTURE.md`](ARCHITECTURE.md) |\n| **Security & disclosure** | [`SECURITY.md`](SECURITY.md) |\n| **Product narrative** | [`docs\u002FPRODUCT-VISION.md`](docs\u002FPRODUCT-VISION.md) |\n\n---\n\n## Capabilities at a glance\n\n| Area | What you get | Why platform owners care |\n|------|----------------|---------------------------|\n| **Topology** | `Discover()` from **sysfs** or **hwloc XML**; JSON, matrices, **cluster hints** for Slurm\u002FMPI\u002Fshell | Fewer wrong `MAKEFLAGS`, rank layouts, and “mystery” remote NUMA |\n| **CPU placement** | `same-numa` CPU set + `sched_setaffinity`; optional **`numactl --membind`** (local DRAM); `--priority high|normal|low` (nice) | Fewer remote **NUMA** stalls; hotter **LLC** reuse vs random migration |\n| **DAG runner** | YAML pipelines, `taskset` children, **`--prom-file`** → `nexusflow_dag_*` metrics | Repeatable **CI \u002F ETL** with **node-local** Prometheus text |\n| **Shared memory** | `\u002Fdev\u002Fshm` segments, **0600**, random or **exclusive** named paths | Fast IPC without drowning in socket copies (see security model) |\n| **Plasma** | Unix coordinator, **branch**, **sample**, **request_fd** \u002F **fd_reply** + **SCM_RIGHTS** | **Data-plane** graphs with **fd** handoff—not one flat shell script |\n| **Perf** | `perf_event_open` wrappers; fds usable with Plasma | Tie **instruction \u002F cycle** windows to steps |\n| **Dashboard** | TLS, bearer token, CIDR ACL, `\u002Fhealthz` | Remote allow-listed **CLI** execution |\n| **Daemon \u002F gRPC** | `nexusflow daemon` — cgroup v2 **cpuset** cells, LLC perf stream, hugepages, eviction | `api\u002Fv1\u002Fnexusflow.proto` |\n| **Hugepages (CLI)** | `nexusflow hugepages set --pages N [--size 2M\\|1G]` (`--count` alias) | Writes kernel `nr_hugepages` (privileged) |\n\n---\n\n## Architecture (summary)\n\n| Layer | Implementation (today) | Knobs |\n|-------|-------------------------|--------|\n| **Topology graph** | `pkg\u002Ftopology`, `pkg\u002Fhwloc` — CPUs, packages, NUMA nodes, distances when present | `--source sysfs|hwloc|auto` |\n| **Placement scoring** | **Largest NUMA domain** that fits `want` CPUs; tie-break by node id; spill across nodes only if needed | `--strategy same-numa`, `--numa N` |\n| **Execution** | `run`: `sched_setaffinity` + optional **numactl** + optional **nice**; DAG: **`taskset`** | `run --cpus --numa --priority --membind`, `dag run` |\n| **Data plane (Plasma)** | `pkg\u002Fplasma` + `pkg\u002Fshm` — `mmap` shared mappings; Unix stream socket **control**; optional **fd** payload | `plasma run --listen`, `--shm-name` |\n| **Observation** | DAG Prometheus text; **perf** counter fds | `--prom-file`, `perf sample` |\n| **Daemon (gRPC)** | `pkg\u002Fdaemon` + `api\u002Fv1\u002F*.proto` — cells, **RunInCell**, **WatchL3**, **SetHugepages**, **EvictForeign** | `nexusflow daemon --listen …` |\n| **Cluster \u002F node wrap** | Root **`install.sh`**: image build, host `nexusflow`\u002F`kubermetrics` install, optional UI | See `install.sh`, `nexusflow\u002Fexamples\u002Fcluster\u002F` |\n\n**Full narrative (topology graph, SCM_RIGHTS, integration patterns):** [`ARCHITECTURE.md`](ARCHITECTURE.md)\n\n---\n\n## Security & trust (summary)\n\nThreat model, capabilities, dashboard hardening, disclosure process: **[`SECURITY.md`](SECURITY.md)**\n\n---\n\n## Benchmarks\n\nMeasured project campaign on representative enterprise \u002F HPC workloads: **same hardware class**, baseline = default Linux \u002F Kubernetes placement without NexusFlow affinity and DAG tuning; optimized = NexusFlow `same-numa` placement, topology-driven parallelism, and pipeline staging.\n\n| Workload | Standard K8s \u002F Linux Scheduler | NexusFlow Optimized | Performance Gain |\n|----------|--------------------------------|---------------------|------------------|\n| LLM Inference (Llama-3) | 12 tokens\u002Fsec | 18 tokens\u002Fsec | +50% |\n| Linux Kernel Build | 4m 12s | 3m 05s | +26% |\n| CFD Simulation | 1.2 hours | 0.9 hours | +25% |\n\n*NexusFlow turns wasted scheduler \u002F cross-NUMA overhead into useful throughput—lower tail latency and higher effective CPU utilization on large-socket fleets.*\n\n---\n## Module import path\n\n```text\ngithub.com\u002Fkube-metrics\u002Fnexusflow\n```\n\n---\n\n## License \u002F governance\n\nAdd your `LICENSE` to match your organization’s open-source policy. Security contact and disclosure expectations: **[`SECURITY.md`](SECURITY.md)**. Architecture sign-off pack: **[`ARCHITECTURE.md`](ARCHITECTURE.md)**.\n","NexusFlow 是一个面向高性能计算的NUMA感知调度器，专为高核心数裸金属集群设计。它通过优化任务调度与资源分配，如CPU绑定、内存节点绑定以及基于YAML的DAG流水线执行，显著减少了在128核及以上EPYC\u002FThreadripper节点上的延迟问题。项目采用Go语言开发，并支持gRPC守护进程和可选的Python SDK，适用于需要低延迟、高效率处理的场景，例如HPC（高性能计算）、机器学习推理及持续集成\u002F持续部署环境。NexusFlow不替代kube-scheduler或Slurm，而是作为单节点或多步骤执行时的“自动导航层”，确保任务能够更高效地利用本地资源，从而提高整体性能表现。",2,"2026-06-06 02:31:05","CREATED_QUERY"]