[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79677":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":26,"discoverSource":27},79677,"flashlib","FlashML-org\u002Fflashlib","FlashML-org","Fast and memory-efficient classical machine learning operators",null,"Python",506,36,2,0,67,282,18,8.7,"Apache License 2.0",false,"main",[],"2026-06-12 02:03:54","# FlashLib\n\nA GPU library for classical machine-learning operators — `kmeans`, `knn`,\n`pca`, `svd`, `dbscan`, `hdbscan`, `umap`, `t-sne`, regression, GEMM, and\nmore — built on Triton and CuteDSL.\n\nSee [the blog post](https:\u002F\u002Fflashml-org.github.io\u002F) for motivation, design,\nand benchmarks.\n\n## Installation\n\nInstall with `pip`:\n\n```bash\npip install flashlib\n```\n\nFrom source:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FFlashML-org\u002Fflashlib.git\ncd flashlib\npip install -e .\n```\n\n## Usage\n\n```python\nimport torch\nfrom flashlib import flash_kmeans\n\nx = torch.randn(1_000_000, 128, device=\"cuda\", dtype=torch.float32)\nlabels, centroids, n_iter = flash_kmeans(x, n_clusters=1024, max_iters=20)\n```\n\nEvery primitive is exposed as a top-level `flash_*` function and as a\nsklearn-style class (`KMeans`, `PCA`, `HDBSCAN`, …).\n\n### Informative API\n\nThe `flashlib.info` submodule predicts runtime, FLOPs, and HBM bytes for any\nprimitive in ~5&nbsp;µs on pure CPU — useful for budgeting a pipeline before\nlaunching it, and small enough for an LLM agent to call in a GPU-less\nenvironment. It does not import torch, triton, or cutlass.\n\n```python\nimport flashlib.info as info\n\nest = info.estimate(\"kmeans\",\n                    shape=(100_000, 64),\n                    params={\"K\": 256, \"max_iters\": 20},\n                    device=\"H200\")\nprint(est.summary_line())\n```\n\nSee the blog post for the full API, the tolerance-driven dispatch, and\nper-primitive benchmarks.\n\n## Coverage\n\nThe current release ships **15 high-level primitives** across the following families:\n\n| family         | primitives                                                                       |\n| -------------- | -------------------------------------------------------------------------------- |\n| Clustering     | `flash_kmeans`, `flash_dbscan`, `flash_hdbscan`, `flash_spectral_clustering`     |\n| Nearest nbrs   | `flash_knn`                                                                      |\n| Decomposition  | `flash_pca`, `flash_truncated_svd`                                               |\n| Manifold       | `flash_umap`, `flash_tsne`                                                       |\n| Regression     | `flash_linear_regression`, `flash_ridge`, `flash_logistic_regression`            |\n| Classification | `flash_multinomial_nb`, `flash_random_forest`                                    |\n| Preprocessing  | `flash_standard_scaler`                                                          |\n\nPlus low-level linear-algebra primitives (`cov_gemm`, `gram_gemm`, `ab_gemm`,\n`eigh`, `polar`, `msign`, `cholqr2`, `split_basis`) and a Pareto-frontier set\nof multi-precision GEMM variants (`gemm`, `gemm_tf32`, `gemm_3xtf32`,\n`gemm_bf16`, `gemm_fp16`, `gemm_fp16_x9`, `gemm_fp16_x3_kahan`,\n`gemm_ozaki2_int8`, …).\n\n## Citation\n\n```bibtex\n@misc{yang2026flashlib,\n  title  = {FlashLib: Bringing Flash Magic to Classical Machine Learning Operators},\n  author = {Yang, Shuo and Xi, Haocheng and Zhao, Yilong and Mang, Qiuyang and\n            Wang, Zhe and Sun, Shanlin and Keutzer, Kurt and Gonzalez, Joseph E. and\n            Han, Song and Xu, Chenfeng and Stoica, Ion},\n  year   = {2026},\n  url    = {https:\u002F\u002Fflashml-org.github.io\u002F},\n}\n```\n\n## License\n\n[Apache License 2.0](LICENSE).\n","FlashLib是一个快速且内存高效的经典机器学习算子库，支持多种算法如KMeans、DBSCAN、PCA等。该项目基于Triton和CuteDSL构建，利用GPU加速实现了15种高级算子，涵盖聚类、最近邻、分解、流形学习、回归及分类等领域，并提供了多精度GEMM变体等低级线性代数原语。其特色功能包括通过`flashlib.info`模块在纯CPU环境下快速预测运行时间、FLOPs和HBM字节数的能力，这有助于在无GPU环境中进行预算规划。FlashLib适用于需要高效处理大规模数据集的经典机器学习任务场景，特别是在资源受限或对性能有高要求的应用中表现优异。","2026-06-11 03:58:15","CREATED_QUERY"]