[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72607":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":16,"starSnapshotCount":16,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},72607,"cutile-python","NVIDIA\u002Fcutile-python","NVIDIA","cuTile is a programming model for writing parallel kernels for NVIDIA GPUs","",null,"Python",2068,140,19,12,0,3,8,24,9,28.45,"Other",false,"main",true,[27,28,29,30,31,32,33],"cutile","gpu","kernel","parallel-kernels","python","tile","tile-based-programming","2026-06-12 02:03:05","\u003C!--- SPDX-FileCopyrightText: Copyright (c) \u003C2025> NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->\n\u003C!--- SPDX-License-Identifier: Apache-2.0 -->\n\ncuTile Python\n=============\n\ncuTile Python is a programming language for NVIDIA GPUs. The official documentation can be found\non [docs.nvidia.com](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcutile-python),\nor built from source located in the [docs](docs\u002F) folder.\n\n\nExample\n-------\n```python\n# This examples uses CuPy which can be installed via `pip install cupy-cuda13x`\n# Make sure cuda toolkit 13.1+ is installed: https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads\n\nimport cuda.tile as ct\nimport cupy\nimport numpy as np\n\nTILE_SIZE = 16\n\n# cuTile kernel for adding two dense vectors. It runs in parallel on the GPU.\n@ct.kernel\ndef vector_add_kernel(a, b, result):\n    block_id = ct.bid(0)\n    a_tile = ct.load(a, index=(block_id,), shape=(TILE_SIZE,))\n    b_tile = ct.load(b, index=(block_id,), shape=(TILE_SIZE,))\n    result_tile = a_tile + b_tile\n    ct.store(result, index=(block_id,), tile=result_tile)\n\n# Generate input arrays\nrng = cupy.random.default_rng()\na = rng.random(128)\nb = rng.random(128)\nexpected = cupy.asnumpy(a) + cupy.asnumpy(b)\n\n# Allocate an output array and launch the kernel\nresult = cupy.zeros_like(a)\ngrid = (ct.cdiv(a.shape[0], TILE_SIZE), 1, 1)\nct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))\n\n# Verify the results\nresult_np = cupy.asnumpy(result)\nnp.testing.assert_array_almost_equal(result_np, expected)\n```\n\nMore examples can be found at [Samples](samples\u002F) and [TileGym](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTileGym).\n\nSystem Requirements\n-------------------\ncuTile Python generates kernels based on [Tile IR](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Ftile-ir\u002F)\nwhich requires NVIDIA Driver r580 or later to run.\nFurthermore, the `tileiras` compiler (version 13.2) only supports Blackwell GPU and Ampere\u002FAda\nGPU. Hopper GPU will be supported in the coming versions.\nCheckout the [prerequisites](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcutile-python\u002Fquickstart.html#prerequisites)\nfor full list of requirements.\n\n\nInstalling from PyPI\n--------------------\ncuTile Python is published on [PyPI](https:\u002F\u002Fpypi.org\u002F) under the\n[cuda-tile](https:\u002F\u002Fpypi.org\u002Fproject\u002Fcuda-tile\u002F) package name and can be installed with `pip`:\n```\npip install cuda-tile[tileiras]\n```\nThe optional `tileiras` dependency installs the `tileiras` compiler directly into your python\nenvironment.\n\n\nIf you do not want to have `tileiras` inside the python environment, run\n```\npip install cuda-tile\n```\nand install [CUDA Toolkit 13.1+](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads) separately.\n\nOn a Debian-based system, use `apt-get install cuda-tileiras-13.2\ncuda-compiler-13.2` instead of `apt-get install cuda-toolkit-13.2` if you wish\nto avoid installing the full CUDA Toolkit.\n\n\nBuilding from Source\n--------------------\ncuTile is written mostly in Python, but includes a C++ extension which needs to be built.\nYou will need:\n- A C++17-capable compiler, such as GNU C++ or MSVC;\n- CMake 3.18+;\n- GNU Make on Linux or msbuild on Windows;\n- Python 3.10+ with development headers (`venv` module is recommended but optional);\n- [CUDA Toolkit 13.1+](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads)\n\nOn an Ubuntu system, the first four dependencies can be installed with APT:\n```\nsudo apt-get update && sudo apt-get install build-essential cmake python3-dev python3-venv\n```\n\nThe CMakeLists.txt script will also automatically download\nthe [DLPack](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fdlpack) dependency from GitHub.\nIf you wish to disable this behavior and provide your own copy of DLPack,\nset the `CUDA_TILE_CMAKE_DLPACK_PATH` environment variable to a local path\nto the DLPack source tree.\n\nUnless you are already using a Python virtual environment, it is recommended to create one\nin order to avoid installing cuTile globally:\n\n```\npython3 -m venv env\nsource env\u002Fbin\u002Factivate\n```\n\nOnce the build dependencies are in place, the simplest way to build cuTile is to install it\nin editable mode by running the following command in the source root directory:\n\n```\npip install -e .\n```\n\nThis will create the `build` directory and invoke the CMake-based build process.\nIn editable mode, the compiled extension module will be placed in the build directory,\nand then a symbolic link to it will be created in the source directory.\nThis makes sure that the `pip install -e .` command above is needed only once, and recompiling\nthe extension after making changes to the C++ code can be done with `make -C build`\nwhich is much faster. This logic is defined in [setup.py](.\u002Fsetup.py).\n\nExperimental Features (Optional)\n--------------------------------\ncuTile now provides an experimental package containing APIs that are still under active development.\nThese are **not** part of the stable `cuda.tile` API and may change.\n\nTo enable the experimental features when working from a source checkout, install the experimental\npackage from the repository root:\n```\npip install .\u002Fexperimental\u002Ftile_experimental\n```\n\nYou can also install it directly from a GitHub repository subdirectory:\n```\npip install \\\n  \"git+https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fcutile-python.git#egg=cuda-tile-experimental&subdirectory=experimental\u002Ftile_experimental\"\n```\n\nFor example, this will make the experimental namespace available for autotuner:\n```\nfrom cuda.tile_experimental import autotune_launch, clear_autotune_cache\n```\n\nRunning Tests\n-------------\ncuTile uses the [pytest](https:\u002F\u002Fpytest.org) framework for testing.\nTests have extra dependencies, such as PyTorch, which can be installed with\n```\npip install -r test\u002Frequirements.txt\n```\n\nThe tests are located in the [test\u002F](test\u002F) directory. To run a specific test file,\nfor example `test_copy.py`, use the following command:\n```\npytest test\u002Ftest_copy.py\n```\n\nCopyright and License Information\n---------------------------------\nCopyright © 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.\n\ncuTile-Python is licensed under the Apache 2.0 license. See the [LICENSES](LICENSES\u002F) folder for the full license text.\n","cuTile Python 是一个用于编写 NVIDIA GPU 并行内核的编程模型。它支持通过 Python 代码定义并行计算任务，并利用 Tile IR 自动生成高效的 GPU 内核，从而简化了 GPU 编程流程。其核心功能包括基于 tile 的数据加载与存储、自动化的内存管理和优化的并行执行策略。特别适合需要高效处理大规模数据集的应用场景，如科学计算、深度学习和图像处理等。使用 cuTile Python 可以显著降低开发者接触底层 GPU 编程细节的门槛，同时保持高性能计算的优势。",2,"2026-06-11 03:42:46","high_star"]