[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-10915":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":9,"totalLinesOfCode":9,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":9,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":16,"starSnapshotCount":16,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},10915,"meshoptimizer","zeux\u002Fmeshoptimizer","zeux","Mesh optimization library that makes meshes smaller and faster to render",null,"https:\u002F\u002Fgithub.com\u002Fzeux\u002Fmeshoptimizer","C++",7750,638,115,4,0,18,40,97,54,111.12,false,"main",[25,26,27,28,29,30],"gpu","mesh-processing","optimization","compression","simplification","gltf","2026-06-12 04:00:52","# 🐇 meshoptimizer [![Actions Status](https:\u002F\u002Fgithub.com\u002Fzeux\u002Fmeshoptimizer\u002Fworkflows\u002Fbuild\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fzeux\u002Fmeshoptimizer\u002Factions) [![codecov.io](https:\u002F\u002Fcodecov.io\u002Fgithub\u002Fzeux\u002Fmeshoptimizer\u002Fcoverage.svg?branch=master)](https:\u002F\u002Fcodecov.io\u002Fgithub\u002Fzeux\u002Fmeshoptimizer?branch=master) [![MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg)](LICENSE.md) [![GitHub](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Frepo-github-green.svg)](https:\u002F\u002Fgithub.com\u002Fzeux\u002Fmeshoptimizer)\n\n## Purpose\n\nWhen a GPU renders triangle meshes, various stages of the GPU pipeline have to process vertex and index data. The efficiency of these stages depends on the data you feed to them; this library provides algorithms to help optimize meshes for these stages, as well as algorithms to reduce the mesh complexity and storage overhead.\n\nThe library provides a C and C++ interface for all algorithms; you can use it from C\u002FC++ or from other languages via FFI (such as P\u002FInvoke). If you want to use this library from Rust, you should use [meshopt crate](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fmeshopt). JavaScript interface for some algorithms is available through [meshoptimizer.js](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fmeshoptimizer).\n\nTwo companion projects are developed and distributed alongside the library: [gltfpack](.\u002Fgltf\u002FREADME.md), a command-line tool that automatically optimizes glTF files, and [clusterlod.h](.\u002Fdemo\u002Fclusterlod.h), a single-header C\u002FC++ library for continuous level of detail using clustered simplification.\n\n## Installing\n\nmeshoptimizer is hosted on GitHub; you can download the latest release using git:\n\n```\ngit clone -b v1.1 https:\u002F\u002Fgithub.com\u002Fzeux\u002Fmeshoptimizer.git\n```\n\nAlternatively you can [download the .zip archive from GitHub](https:\u002F\u002Fgithub.com\u002Fzeux\u002Fmeshoptimizer\u002Farchive\u002Fv1.1.zip).\n\nThe library is also available as a Linux package in several distributions ([ArchLinux](https:\u002F\u002Faur.archlinux.org\u002Fpackages\u002Fmeshoptimizer\u002F), [Debian](https:\u002F\u002Fpackages.debian.org\u002Flibmeshoptimizer), [FreeBSD](https:\u002F\u002Fwww.freshports.org\u002Fmisc\u002Fmeshoptimizer\u002F), [Nix](https:\u002F\u002Fmynixos.com\u002Fnixpkgs\u002Fpackage\u002Fmeshoptimizer), [Ubuntu](https:\u002F\u002Fpackages.ubuntu.com\u002Flibmeshoptimizer)), as well as a [Vcpkg port](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvcpkg\u002Ftree\u002Fmaster\u002Fports\u002Fmeshoptimizer) (see [installation instructions](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fvcpkg\u002Fget_started\u002Fget-started)) and a [Conan package](https:\u002F\u002Fconan.io\u002Fcenter\u002Frecipes\u002Fmeshoptimizer).\n\n[gltfpack](.\u002Fgltf\u002FREADME.md) is available as a pre-built binary on [Releases page](https:\u002F\u002Fgithub.com\u002Fzeux\u002Fmeshoptimizer\u002Freleases) or via [npm package](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002Fgltfpack). Native binaries are recommended since they are more efficient and support texture compression.\n\n## Building\n\nmeshoptimizer is distributed as a C\u002FC++ header (`src\u002Fmeshoptimizer.h`) and a set of C++ source files (`src\u002F*.cpp`). To include it in your project, you can use one of two options:\n\n* Use CMake to build the library (either as a standalone project or as part of your project)\n* Add source files to your project's build system\n\nThe source files are organized in such a way that you don't need to change your build-system settings, and you only need to add the source files for the algorithms you use. They should build without warnings or special compilation options on all major compilers. If you prefer amalgamated builds, you can also concatenate the source files into a single `.cpp` file and build that instead.\n\nTo use meshoptimizer functions, simply `#include` the header `meshoptimizer.h`; the library source is C++, but the header is C-compatible.\n\n## Core pipeline\n\nWhen optimizing a mesh, to maximize rendering efficiency you should typically feed it through a set of optimizations (the order is important!):\n\n1. Indexing\n2. Vertex cache optimization\n3. (optional) Overdraw optimization\n4. Vertex fetch optimization\n5. Vertex quantization\n6. (optional) Shadow indexing\n\n### Indexing\n\nMost algorithms in this library assume that a mesh has a vertex buffer and an index buffer. For algorithms to work well and also for GPU to render your mesh efficiently, the vertex buffer has to have no redundant vertices; you can generate an index buffer from an unindexed vertex buffer or reindex an existing (potentially redundant) index buffer as follows:\n\n> Note: meshoptimizer generally works with 32-bit (`unsigned int`) indices, however when using C++ APIs you can use any integer type for index data by using the provided template overloads. By convention, remap tables always use `unsigned int`.\n\nFirst, generate a remap table from your existing vertex (and, optionally, index) data:\n\n```c++\nsize_t index_count = face_count * 3;\nsize_t unindexed_vertex_count = face_count * 3;\nstd::vector\u003Cunsigned int> remap(unindexed_vertex_count); \u002F\u002F temporary remap table\nsize_t vertex_count = meshopt_generateVertexRemap(&remap[0], NULL, index_count,\n    &unindexed_vertices[0], unindexed_vertex_count, sizeof(Vertex));\n```\n\nNote that in this case we only have an unindexed vertex buffer; when input mesh has an index buffer, it will need to be passed to `meshopt_generateVertexRemap` instead of `NULL`, along with the correct source vertex count. In either case, the remap table is generated based on binary equivalence of the input vertices, so the resulting mesh will render the same way. Binary equivalence considers all input bytes, including padding which should be zero-initialized if the vertex structure has gaps.\n\nAfter generating the remap table, you can allocate space for the target vertex buffer (`vertex_count` elements) and index buffer (`index_count` elements) and generate them:\n\n```c++\nmeshopt_remapIndexBuffer(indices, NULL, index_count, &remap[0]);\nmeshopt_remapVertexBuffer(vertices, &unindexed_vertices[0], unindexed_vertex_count, sizeof(Vertex), &remap[0]);\n```\n\nYou can then further optimize the resulting buffers by calling the other functions on them in-place.\n\n`meshopt_generateVertexRemap` uses binary equivalence of vertex data, which is generally a reasonable default; however, in some cases some attributes may have floating point drift causing extra vertices to be generated. For such cases, it may be necessary to quantize some attributes (most importantly, normals and tangents) before generating the remap, or use `meshopt_generateVertexRemapCustom` algorithm that allows comparing individual attributes with tolerance by providing a custom comparison function:\n\n```c++\nsize_t vertex_count = meshopt_generateVertexRemapCustom(&remap[0], NULL, index_count,\n    &unindexed_vertices[0].px, unindexed_vertex_count, sizeof(Vertex),\n    [&](unsigned int lhs, unsigned int rhs) -> bool {\n        const Vertex& lv = unindexed_vertices[lhs];\n        const Vertex& rv = unindexed_vertices[rhs];\n\n        return fabsf(lv.tx - rv.tx) \u003C 1e-3f && fabsf(lv.ty - rv.ty) \u003C 1e-3f;\n    });\n```\n\n### Vertex cache optimization\n\nWhen the GPU renders the mesh, it runs the vertex shader for each vertex. Historically, GPUs used a small fixed-size post-transform cache (16-32 vertices) with different replacement policies to store the shader output and avoid redundant shader invocations. Modern GPUs still perform vertex reuse, but with substantially different mechanics: vertex invocations are batched into thread groups based on the input indices, and effective reuse depends on factors like vertex shader outputs and rasterizer throughput. To maximize the locality of reused vertex references, you have to reorder your triangles like so:\n\n```c++\nmeshopt_optimizeVertexCache(indices, indices, index_count, vertex_count);\n```\n\nThe details of vertex reuse vary between different GPU architectures, so vertex cache optimization uses an adaptive algorithm that produces a triangle sequence with good locality that works well across different GPUs. Alternatively, you can use an algorithm that optimizes specifically for fixed-size FIFO caches: `meshopt_optimizeVertexCacheFifo` (with a recommended cache size of 16). While it generally produces less performant results on most GPUs, it runs ~2x faster, which may benefit rapid content iteration.\n\n### Overdraw optimization\n\nAfter transforming the vertices, GPU sends the triangles for rasterization which results in generating pixels that are usually first run through the depth test, and pixels that pass it get the pixel shader executed to generate the final color. As pixel shaders get more expensive, it becomes more and more important to reduce overdraw. While in general improving overdraw requires view-dependent operations, this library provides an algorithm to reorder triangles to minimize the overdraw from all directions, which you can run after vertex cache optimization like this:\n\n```c++\nmeshopt_optimizeOverdraw(indices, indices, index_count, &vertices[0].x, vertex_count, sizeof(Vertex), 1.05f);\n```\n\nThe overdraw optimizer needs to read vertex positions as a float3 from the vertex; the code snippet above assumes that the vertex stores position as `float x, y, z`.\n\nWhen performing the overdraw optimization you have to specify a floating-point threshold parameter. The algorithm tries to maintain a balance between vertex cache efficiency and overdraw; the threshold determines how much the algorithm can compromise the vertex cache hit ratio, with 1.05 meaning that the resulting ratio should be at most 5% worse than before the optimization.\n\nNote that depending on the renderer structure and target hardware, the optimization may or may not be beneficial; for example, mobile GPUs with tiled deferred rendering (PowerVR, Apple) would not benefit from this optimization. For vertex heavy scenes it's recommended to measure the performance impact to ensure that the reduced vertex cache efficiency is outweighed by the reduced overdraw.\n\n### Vertex fetch optimization\n\nAfter the final triangle order has been established, we still can optimize the vertex buffer for memory efficiency. Before running the vertex shader GPU has to fetch the vertex attributes from the vertex buffer; the fetch is usually backed by a memory cache, and as such optimizing the data for the locality of memory access is important. You can do this by running this code:\n\n```c++\nmeshopt_optimizeVertexFetch(vertices, indices, index_count, vertices, vertex_count, sizeof(Vertex));\n```\n\nThis will reorder the vertices in the vertex buffer to try to improve the locality of reference, and rewrite the indices in place to match; if the vertex data is stored using multiple streams, you should use `meshopt_optimizeVertexFetchRemap` instead. This optimization has to be performed on the final index buffer since the optimal vertex order depends on the triangle order.\n\nNote that the algorithm does not try to model cache replacement precisely and instead just orders vertices in the order of use, which generally produces results that are close to optimal.\n\n### Vertex quantization\n\nTo optimize memory bandwidth when fetching the vertex data even further, and to reduce the amount of memory required to store the mesh, it is often beneficial to quantize the vertex attributes to smaller types. While this optimization can technically run at any part of the pipeline (and sometimes doing quantization as the first step can improve indexing by merging almost identical vertices), it generally is easier to run this after all other optimizations since some of them require access to float3 positions.\n\nQuantization is usually domain specific; it's common to quantize normals using 3 8-bit integers but you can use higher-precision quantization (for example using 10 bits per component in a 10_10_10_2 format), or a different encoding to use just 2 components. For positions and texture coordinate data the two most common storage formats are half precision floats, and 16-bit normalized integers that encode the position relative to the AABB of the mesh or the UV bounding rectangle.\n\nThe number of possible combinations here is very large but this library does provide the building blocks, specifically functions to quantize floating point values to normalized integers, as well as half-precision floats. For example, here's how you can quantize a normal using 10-10-10 SNORM encoding:\n\n```c++\nunsigned int normal =\n    ((meshopt_quantizeSnorm(v.nx, 10) & 1023) \u003C\u003C 20) |\n    ((meshopt_quantizeSnorm(v.ny, 10) & 1023) \u003C\u003C 10) |\n     (meshopt_quantizeSnorm(v.nz, 10) & 1023);\n```\n\nand here's how you can quantize a position using half precision floats:\n\n```c++\nunsigned short px = meshopt_quantizeHalf(v.x);\nunsigned short py = meshopt_quantizeHalf(v.y);\nunsigned short pz = meshopt_quantizeHalf(v.z);\n```\n\nSince quantized vertex attributes often need to remain in their compact representations for efficient transfer and storage, they are usually dequantized during vertex processing by configuring the GPU vertex input correctly to expect normalized integers or half precision floats, which often needs no or minimal changes to the shader code. When CPU dequantization is required instead, `meshopt_dequantizeHalf` can be used to convert half precision values back to single precision; for normalized integer formats, the dequantization just requires dividing by 2^N-1 for unorm and 2^(N-1)-1 for snorm variants. For example, manually reversing `meshopt_quantizeUnorm(v, 10)` can be done by dividing by 1023.\n\n### Shadow indexing\n\nMany rendering pipelines require meshes to be rendered to depth-only targets, such as shadow maps or during a depth pre-pass, in addition to color\u002FG-buffer targets. While using the same geometry data for both cases is possible, reducing the number of unique vertices for depth-only rendering can be beneficial, especially when the source geometry has many attribute seams due to faceted shading or lightmap texture seams.\n\nTo achieve this, this library provides the `meshopt_generateShadowIndexBuffer` algorithm, which generates a second (shadow) index buffer that can be used with the original vertex data:\n\n```c++\nstd::vector\u003Cunsigned int> shadow_indices(index_count);\n\u002F\u002F note: this assumes Vertex starts with float3 positions and should be adjusted accordingly for quantized positions\nmeshopt_generateShadowIndexBuffer(&shadow_indices[0], indices, index_count, &vertices[0].x, vertex_count, sizeof(float) * 3, sizeof(Vertex));\n```\n\nBecause the vertex data is shared, shadow indexing should be done after other optimizations of the vertex\u002Findex data. However, it's possible (and recommended) to optimize the resulting shadow index buffer for vertex cache:\n\n```c++\nmeshopt_optimizeVertexCache(&shadow_indices[0], &shadow_indices[0], index_count, vertex_count);\n```\n\nIn some cases, it may be beneficial to split the vertex positions into a separate buffer to maximize efficiency for depth-only rendering. Note that the example above assumes only positions are relevant for shadow rendering, but more complex materials may require adding texture coordinates (for alpha testing) or skinning data to the vertex portion used as a key. `meshopt_generateShadowIndexBufferMulti` can be useful for these cases if the relevant data is not contiguous.\n\nNote that for meshes with optimal indexing and few attribute seams, the shadow index buffer will be very similar to the original index buffer, so it may not be always worth generating a separate shadow index buffer even if the rendering pipeline relies on depth-only passes.\n\n## Clusterization\n\nWhile traditionally meshes have served as a unit of rendering, new approaches to rendering and raytracing are starting to use a smaller unit of work, such as clusters or meshlets. This allows more freedom in how the geometry is processed, and can lead to better performance and more efficient use of GPU hardware. This section describes algorithms designed to work with meshes as sets of clusters.\n\n### Mesh shading\n\nModern GPUs are beginning to deviate from the traditional rasterization model. NVidia GPUs starting from Turing and AMD GPUs starting from RDNA2 provide a new programmable geometry pipeline that, instead of being built around index buffers and vertex shaders, is built around mesh shaders - a new shader type that allows to provide a batch of work to the rasterizer.\n\nUsing mesh shaders in context of traditional mesh rendering provides an opportunity to use a variety of optimization techniques, starting from more efficient vertex reuse, using various forms of culling (e.g. cluster frustum or occlusion culling) and in-memory compression to maximize the utilization of GPU hardware. Beyond traditional rendering mesh shaders provide a richer programming model that can synthesize new geometry more efficiently than common alternatives such as geometry shaders. Mesh shading can be accessed via Vulkan or Direct3D 12 APIs; please refer to [Introduction to Turing Mesh Shaders](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fintroduction-turing-mesh-shaders\u002F) and [Mesh Shaders and Amplification Shaders: Reinventing the Geometry Pipeline](https:\u002F\u002Fdevblogs.microsoft.com\u002Fdirectx\u002Fcoming-to-directx-12-mesh-shaders-and-amplification-shaders-reinventing-the-geometry-pipeline\u002F) for additional information.\n\nTo use mesh shaders for conventional rendering efficiently, geometry needs to be converted into a series of meshlets; each meshlet represents a small subset of the original mesh and comes with a small set of vertices and a separate micro-index buffer that references vertices in the meshlet. This information can be directly fed to the rasterizer from the mesh shader. This library provides algorithms to create meshlet data for a mesh, and - assuming geometry is static - can compute bounding information that can be used to perform cluster culling, rejecting meshlets that are invisible on screen.\n\nTo generate meshlet data, this library provides `meshopt_buildMeshlets` algorithm, which tries to balance topological efficiency (by maximizing vertex reuse inside meshlets) with culling efficiency (by minimizing meshlet radius and triangle direction divergence) and produces GPU-friendly data. As an alternative (that can be useful for load-time processing), `meshopt_buildMeshletsScan` can create the meshlet data using a vertex cache-optimized index buffer as a starting point by greedily aggregating consecutive triangles until they go over the meshlet limits. `meshopt_buildMeshlets` is recommended for offline data processing even if cone culling is not used.\n\n```c++\nconst size_t max_vertices = 64;\nconst size_t max_triangles = 126; \u002F\u002F note: in v0.25 or prior, max_triangles needs to be divisible by 4\nconst float cone_weight = 0.0f;\n\nsize_t max_meshlets = meshopt_buildMeshletsBound(indices.size(), max_vertices, max_triangles);\nstd::vector\u003Cmeshopt_Meshlet> meshlets(max_meshlets);\nstd::vector\u003Cunsigned int> meshlet_vertices(indices.size());\nstd::vector\u003Cunsigned char> meshlet_triangles(indices.size()); \u002F\u002F note: in v0.25 or prior, use indices.size() + max_meshlets * 3\n\nsize_t meshlet_count = meshopt_buildMeshlets(meshlets.data(), meshlet_vertices.data(), meshlet_triangles.data(), indices.data(),\n    indices.size(), &vertices[0].x, vertices.size(), sizeof(Vertex), max_vertices, max_triangles, cone_weight);\n```\n\nTo generate the meshlet data, `max_vertices` and `max_triangles` need to be set within limits supported by the hardware; for NVidia the values of 64 and 126 are recommended. `cone_weight` should be left as 0 if cluster cone culling is not used, and set to a value between 0 and 1 to balance cone culling efficiency with other forms of culling like frustum or occlusion culling (`0.25` is a reasonable default).\n\n> Note that for earlier AMD GPUs, the best configurations tend to use the same limits for `max_vertices` and `max_triangles`, such as 64 and 64, or 128 and 128. Additionally, while NVidia recommends 64\u002F126 as a good configuration, consider using a different configuration like `max_vertices 64, max_triangles 96`, to provide more realistic limits that are achievable on real-world meshes, and to reduce the overhead on other GPUs.\n\nEach resulting meshlet refers to a portion of `meshlet_vertices` and `meshlet_triangles` arrays; the arrays are overallocated for the worst case so it's recommended to trim them before saving them as an asset \u002F uploading them to the GPU:\n\n```c++\nconst meshopt_Meshlet& last = meshlets[meshlet_count - 1];\n\nmeshlet_vertices.resize(last.vertex_offset + last.vertex_count);\nmeshlet_triangles.resize(last.triangle_offset + last.triangle_count * 3);\nmeshlets.resize(meshlet_count);\n```\n\nDepending on the application, other strategies of storing the data can be useful; for example, `meshlet_vertices` serves as indices into the original vertex buffer but it might be worthwhile to generate a mini vertex buffer for each meshlet to remove the extra indirection when accessing vertex data, or it might be desirable to compress vertex data as vertices in each meshlet are likely to be very spatially coherent.\n\nFor optimal performance, it is recommended to further optimize each meshlet in isolation for better triangle and vertex locality by calling `meshopt_optimizeMeshlet` on vertex and index data like so:\n\n```c++\nmeshopt_optimizeMeshlet(&meshlet_vertices[m.vertex_offset], &meshlet_triangles[m.triangle_offset], m.triangle_count, m.vertex_count);\n```\n\nDifferent applications will choose different strategies for rendering meshlets; on a GPU capable of mesh shading, meshlets can be rendered directly; for example, a basic GLSL shader for `VK_EXT_mesh_shader` extension could look like this (parts omitted for brevity):\n\n```glsl\nlayout(binding = 0) readonly buffer Meshlets { Meshlet meshlets[]; };\nlayout(binding = 1) readonly buffer MeshletVertices { uint meshlet_vertices[]; };\nlayout(binding = 2) readonly buffer MeshletTriangles { uint8_t meshlet_triangles[]; };\n\nvoid main() {\n    Meshlet meshlet = meshlets[gl_WorkGroupID.x];\n    SetMeshOutputsEXT(meshlet.vertex_count, meshlet.triangle_count);\n\n    for (uint i = gl_LocalInvocationIndex; i \u003C meshlet.vertex_count; i += gl_WorkGroupSize.x) {\n        uint index = meshlet_vertices[meshlet.vertex_offset + i];\n        gl_MeshVerticesEXT[i].gl_Position = world_view_projection * vec4(vertex_positions[index], 1);\n    }\n\n    for (uint i = gl_LocalInvocationIndex; i \u003C meshlet.triangle_count; i += gl_WorkGroupSize.x) {\n        uint offset = meshlet.triangle_offset + i * 3;\n        gl_PrimitiveTriangleIndicesEXT[i] = uvec3(\n            meshlet_triangles[offset], meshlet_triangles[offset + 1], meshlet_triangles[offset + 2]);\n    }\n}\n```\n\n> Note that DirectX 12 mesh shaders cannot index raw buffers using arbitrary byte offsets. Use a typed SRV buffer (`Buffer\u003Cuint>`) with `DXGI_FORMAT_R8_UINT` format, repack each triangle to 32 bits to be able to use aligned 32-bit loads with `ByteAddressBuffer`, or consider utilizing [16-bit scalar types](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDirectXShaderCompiler\u002Fwiki\u002F16-Bit-Scalar-Types) to load a 3-byte triangle using two aligned 16-bit loads like so: `Buffer.Load\u003Cuint16_t2>(triangle_offset & ~1)`, then extracting indices with bitwise operations based on `triangle_offset & 1`.\n\nAfter generating the meshlet data, it's possible to generate extra data for each meshlet that can be saved and used at runtime to perform cluster culling, where each meshlet can be discarded if it's guaranteed to be invisible. To generate the data, `meshopt_computeMeshletBounds` can be used:\n\n```c++\nmeshopt_Bounds bounds = meshopt_computeMeshletBounds(&meshlet_vertices[m.vertex_offset], &meshlet_triangles[m.triangle_offset],\n    m.triangle_count, &vertices[0].x, vertices.size(), sizeof(Vertex));\n```\n\nThe resulting `bounds` values can be used to perform frustum or occlusion culling using the bounding sphere, or cone culling using the cone axis\u002Fangle (which will reject the entire meshlet if all triangles are guaranteed to be back-facing from the camera point of view):\n\n```c++\nif (dot(normalize(cone_apex - camera_position), cone_axis) >= cone_cutoff) reject();\n```\n\nCluster culling should ideally run at a lower frequency than mesh shading, either using amplification\u002Ftask shaders, or using a separate compute dispatch.\n\nBy default, the meshlet builder tries to form complete meshlets even if that requires merging disconnected regions of the mesh into a single meshlet. In some cases, such as hierarchical level of detail, or when advanced culling is used, it may be beneficial to prioritize spatial locality of triangles in a meshlet even if that results in partially filled meshlets. To that end, `meshopt_buildMeshletsFlex` function can be used instead of `meshopt_buildMeshlets`; it provides two triangle limits, `min_triangles` and `max_triangles`, and uses an additional configuration parameter, `split_factor` (recommended value is 2.0), to decide whether increasing the meshlet radius is worth it to fit more triangles in the meshlet. When using this function, the worst case bound for the number of meshlets has to be computed using `meshopt_buildMeshletsBound` with `min_triangles` parameter instead of `max_triangles`.\n\n### Clustered raytracing\n\nIn addition to rasterization, meshlets can also be used for ray tracing. NVidia GPUs starting from Turing with recent drivers provide support for cluster acceleration structures (via `VK_NV_cluster_acceleration_structure` extension \u002F NVAPI); instead of building a traditional BLAS, a cluster acceleration structure can be built for each meshlet and combined into a single clustered BLAS. While this currently results in reduced ray tracing performance for static geometry (for which a traditional BLAS may be more suitable), it allows updating the individual clusters without having to rebuild or refit the entire BLAS, which can be useful for mesh deformation or hierarchical level of detail.\n\nWhen using meshlets for raytracing, the performance characteristics that matter differ from when rendering meshes with rasterization. For raytracing, clusters with optimal spatial division that minimize ray-triangle intersection tests are preferred, while for rasterization, clusters with maximum triangle count within vertex limits are ideal.\n\nTo generate meshlets optimized for raytracing, this library provides `meshopt_buildMeshletsSpatial` algorithm, which builds clusters using surface area heuristic (SAH) to produce raytracing-friendly cluster distributions:\n\n```c++\nconst size_t max_vertices = 64;\nconst size_t min_triangles = 16;\nconst size_t max_triangles = 64;\nconst float fill_weight = 0.5f;\n\nsize_t max_meshlets = meshopt_buildMeshletsBound(indices.size(), max_vertices, min_triangles); \u002F\u002F note: use min_triangles to compute worst case bound\nstd::vector\u003Cmeshopt_Meshlet> meshlets(max_meshlets);\nstd::vector\u003Cunsigned int> meshlet_vertices(indices.size());\nstd::vector\u003Cunsigned char> meshlet_triangles(indices.size()); \u002F\u002F note: in v0.25 or prior, use indices.size() + max_meshlets * 3\n\nsize_t meshlet_count = meshopt_buildMeshletsSpatial(meshlets.data(), meshlet_vertices.data(), meshlet_triangles.data(), indices.data(),\n    indices.size(), &vertices[0].x, vertices.size(), sizeof(Vertex), max_vertices, min_triangles, max_triangles, fill_weight);\n```\n\nThe algorithm recursively subdivides the triangles into a BVH-like hierarchy using SAH for optimal spatial partitioning while balancing cluster size; this results in clusters that are significantly more efficient to raytrace compared to clusters generated by `meshopt_buildMeshlets`, but can still be used for rasterization (for example, to build visibility buffers or G-buffers).\n\nThe `min_triangles` and `max_triangles` parameters control the allowed range of triangles per cluster. For optimal raytracing performance, `min_triangles` should be at most `max_triangles\u002F2` (or, ideally, `max_triangles\u002F4`) to give the algorithm enough freedom to produce high-quality spatial partitioning. For meshes with few seams due to normal or UV discontinuities, using `max_vertices` equal to `max_triangles` is recommended when rasterization performance is a concern; for meshes with many seams or for renderers that primarily use meshlets for ray tracing, a higher `max_vertices` value should be used as it ensures that more clusters can fully utilize the triangle limit.\n\nThe `fill_weight` parameter (typically between 0 and 1, although values higher than 1 could be used to prioritize cluster fill even more) controls the trade-off between pure SAH optimization and triangle utilization. A value of 0 will optimize purely for SAH, resulting in best raytracing performance but potentially smaller clusters. Values between 0.25 and 0.75 typically provide a good balance of SAH quality vs triangle count.\n\nWhen the resulting meshlets are used to generate hardware-specific acceleration structures, using fast trace (e.g. `VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR`) builds results in maximum performance; if build performance is important, using `meshopt_optimizeMeshlet` can help improve ray tracing performance when using fast build (e.g. `VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_BUILD_BIT_KHR`), although the tracing performance will still be lower than with fast trace builds.\n\n### Point cloud clusterization\n\nBoth of the meshlet algorithms are designed to work with triangle meshes. In some cases, splitting a point cloud into fixed size clusters can be useful; the resulting point clusters could be rendered via mesh or compute shaders, or the resulting subdivision can be used to parallelize point processing while maintaining locality of points. To that end, this library provides `meshopt_spatialClusterPoints` algorithm:\n\n```c++\nconst size_t cluster_size = 256;\n\nstd::vector\u003Cunsigned int> index(mesh.vertices.size());\nmeshopt_spatialClusterPoints(&index[0], &mesh.vertices[0].px, mesh.vertices.size(), sizeof(Vertex), cluster_size);\n```\n\nThe resulting index buffer could be used to process the points directly, or reorganize the point data into flat contiguous arrays. Every consecutive chunk of `cluster_size` points in the index buffer refers to a single cluster, with just the last cluster containing fewer points if the total number of points is not a multiple of `cluster_size`. Note that the index buffer is not a remap table, so `meshopt_remapVertexBuffer` can't be used to flatten the point data.\n\n### Cluster partitioning\n\nWhen working with clustered geometry, it can be beneficial to organize clusters into larger groups (partitions) for more efficient processing or workload distribution. This library provides an algorithm to partition clusters into groups of similar size while prioritizing locality:\n\n```c++\nconst size_t partition_size = 24;\n\nstd::vector\u003Cunsigned int> cluster_partitions(cluster_count);\nsize_t partition_count = meshopt_partitionClusters(&cluster_partitions[0], &cluster_indices[0], total_index_count,\n    &cluster_index_counts[0], cluster_count, &vertices[0].x, vertex_count, sizeof(Vertex), partition_size);\n```\n\nThe algorithm assigns each cluster to a partition, aiming for a target partition size while prioritizing topological locality (sharing vertices) and spatial locality. The resulting partitions can be used for more efficient batched processing of clusters, or for hierarchical simplification schemes similar to Nanite.\n\nTwo clusters are considered topologically adjacent if they reference the same indices. In some cases, it can be helpful to process the indices using `meshopt_generateShadowIndexBuffer` (or remap them manually using the remap table generated by `meshopt_generatePositionRemap`), which allows clusters to be considered adjacent even when boundary vertices have different indices due to attribute discontinuities.\n\nIf vertex positions are specified (not `NULL`), spatial locality will influence priority of merging clusters; otherwise, the algorithm will rely solely on topological connections and will not merge disconnected clusters into the same partition, which may result in smaller partitions for some inputs.\n\nAfter partitioning, each element in the destination array contains the partition ID (ranging from 0 to the returned partition count minus 1) for the corresponding cluster. Note that the partitions may be both smaller and larger than the target size; given a target size, the maximum partition size returned currently is `target + target \u002F 3`.\n\n## Mesh compression\n\nIn case storage size or transmission bandwidth is of importance, you might want to additionally compress vertex and index data. While several mesh compression libraries, like Google Draco, are available, they typically are designed to maximize the compression ratio at the cost of disturbing the vertex\u002Findex order (which makes the meshes inefficient to render on GPU) or decompression performance. They also frequently don't support custom game-ready quantized vertex formats and thus require to re-quantize the data after loading it, introducing extra quantization errors and making decoding slower.\n\nAlternatively you can use general purpose compression libraries like zstd or Oodle to compress vertex\u002Findex data - however these compressors aren't designed to exploit redundancies in vertex\u002Findex data and as such compression rates can be unsatisfactory.\n\nTo that end, this library provides algorithms to \"encode\" vertex and index data. The result of the encoding is generally significantly smaller than initial data, and remains compressible with general purpose compressors - so you can either store encoded data directly (for modest compression ratios and maximum decoding performance), or further compress it with LZ4\u002Fzstd\u002FOodle to maximize compression ratio.\n\n> Note: this compression scheme is available as a glTF extension [EXT_meshopt_compression](https:\u002F\u002Fgithub.com\u002FKhronosGroup\u002FglTF\u002Fblob\u002Fmain\u002Fextensions\u002F2.0\u002FVendor\u002FEXT_meshopt_compression\u002FREADME.md) as well as [KHR_meshopt_compression](https:\u002F\u002Fgithub.com\u002FKhronosGroup\u002FglTF\u002Fpull\u002F2517).\n\n### Vertex compression\n\nThis library provides a lossless algorithm to encode\u002Fdecode vertex data. To encode vertices, you need to allocate a target buffer (using the worst case bound) and call the encoding function:\n\n```c++\nstd::vector\u003Cunsigned char> vbuf(meshopt_encodeVertexBufferBound(vertex_count, sizeof(Vertex)));\nvbuf.resize(meshopt_encodeVertexBuffer(&vbuf[0], vbuf.size(), vertices, vertex_count, sizeof(Vertex)));\n```\n\nTo decode the data at runtime, call the decoding function:\n\n```c++\nint res = meshopt_decodeVertexBuffer(vertices, vertex_count, sizeof(Vertex), &vbuf[0], vbuf.size());\nassert(res == 0);\n```\n\nNote that vertex encoding assumes that vertex buffer was optimized for vertex fetch, and that vertices are quantized. Feeding unoptimized data into the encoder may result in poor compression ratios. The codec is lossless by itself - the only lossy step is quantization\u002Freordering or filters that you may apply before encoding. Additionally, if the vertex data contains padding bytes, they should be zero-initialized to ensure that the encoder does not need to store uninitialized data.\n\nDecoder is heavily optimized and can directly target write-combined memory; you can expect it to run at 3-6 GB\u002Fs on modern desktop CPUs. Compression ratio depends on the data; vertex data compression ratio is typically around 2-4x (compared to already quantized and optimally packed data). General purpose lossless compressors can further improve the compression ratio at some cost to decoding performance.\n\nThe vertex codec tries to take advantage of the inherent locality of sequential vertices and identify bit patterns that repeat in consecutive vertices. Typically, vertex cache + vertex fetch provides a reasonably local vertex traversal order; without an index buffer, it is recommended to sort vertices spatially (via `meshopt_spatialSortRemap`) to improve the compression ratio.\n\nIt is crucial to correctly specify the stride when encoding vertex data; however, for compression ratio it does not matter whether the vertices are interleaved or deinterleaved, as the codecs perform full byte deinterleaving internally. The stride of each stream must be a multiple of 4 bytes.\n\nFor optimal compression results, the values should be quantized to small integers. It can be valuable to use bit counts that are not multiples of 8. For example, instead of using 16 bits to represent texture coordinates, use 12-bit integers and divide by 4095 in the shader. Alternatively, using half-precision floats can often achieve good results.\nFor single-precision floating-point data, it's recommended to use `meshopt_quantizeFloat` to remove entropy from the lower bits of the mantissa; for best results, consider using 15 bits or 7 bits for extreme compression.\nFor normal or tangent vectors, using octahedral encoding is recommended over three components as it reduces redundancy; similarly, consider using 10-12 bits per component instead of 16.\n\nWhen data is bit packed, specifying compression level 3 (via `meshopt_encodeVertexBufferLevel`) can improve the compression further by redistributing bits between components.\n\n### Index compression\n\nThis library also provides algorithms to encode\u002Fdecode index data. To encode triangle indices, you need to allocate a target buffer (using the worst case bound) and call the encoding function:\n\n```c++\nstd::vector\u003Cunsigned char> ibuf(meshopt_encodeIndexBufferBound(index_count, vertex_count));\nibuf.resize(meshopt_encodeIndexBuffer(&ibuf[0], ibuf.size(), indices, index_count));\n```\n\nTo decode the data at runtime, call the decoding function:\n\n```c++\nint res = meshopt_decodeIndexBuffer(indices, index_count, &ibuf[0], ibuf.size());\nassert(res == 0);\n```\n\nNote that index encoding assumes that the index buffer was optimized for vertex cache and vertex fetch. Feeding unoptimized data into the encoder will result in poor compression ratios. Codec preserves the order of triangles, however it can rotate each triangle to improve compression ratio (which means the provoking vertex may change).\n\nDecoder is heavily optimized and can directly target write-combined memory; you can expect it to run at 3-6 GB\u002Fs on modern desktop CPUs.\n\nThe index codec targets 1 byte per triangle as a best case (6x smaller than raw 16-bit index data); on real-world meshes, it's typical to achieve 1-1.2 bytes per triangle. To reach this, the index data needs to be optimized for vertex cache and vertex fetch. Optimizations that do not disrupt triangle locality (such as overdraw) are safe to use in between.\nTo reduce the data size further, it's possible to use `meshopt_optimizeVertexCacheStrip` instead of `meshopt_optimizeVertexCache` when optimizing for vertex cache. This trades off some efficiency in vertex transform for smaller index (and sometimes vertex) data.\n\nWhen referenced vertex indices are not sequential, the index codec will use around 2 bytes per index. This can happen when the referenced vertices are a sparse subset of the vertex buffer, such as when encoding LODs. General-purpose compression can be especially helpful in this case.\n\nIndex buffer codec only supports triangle list topology; when encoding triangle strips or line lists, use `meshopt_encodeIndexSequence`\u002F`meshopt_decodeIndexSequence` instead. This codec typically encodes indices into ~1 byte per index, but compressing the results further with a general purpose compressor can improve the results to 1-3 bits per index.\n\n\n### Meshlet compression\n\nWhen using mesh shading or clustered raytracing, meshlet vertex reference and triangle data can be compressed similarly to index data. This library provides a dedicated codec that exploits locality inherent in meshlet data. Unlike vertex and index buffer codecs that work on entire buffers, the meshlet codec encodes each meshlet independently; this allows applications to have more flexibility in structuring the runtime storage and adjust the decoded data during decoding. This also means that in some applications, additional data describing the meshlet (vertex\u002Ftriangle count, encoded size) will need to be encoded into the meshlet stream, if it isn't already available during decoding.\n\nTo encode a meshlet, you need to allocate a target buffer (using the worst case bound) and call the encoding function with the vertex index references and micro-index buffer, as produced by `meshopt_buildMeshlets`:\n\n```c++\nstd::vector\u003Cunsigned char> mbuf(meshopt_encodeMeshletBound(max_vertices, max_triangles));\n\nfor (const meshopt_Meshlet& m : meshlets)\n{\n    size_t msize = meshopt_encodeMeshlet(&mbuf[0], mbuf.size(),\n        &meshlet_vertices[m.vertex_offset], m.vertex_count, &meshlet_triangles[m.triangle_offset], m.triangle_count);\n\n    \u002F\u002F write m.vertex_count, m.triangle_count, msize and mbuf[0..msize-1] to the output stream\n}\n```\n\nTo decode the data at runtime, call the decoding function:\n\n```c++\nuint16_t* vertices = ...;\nuint8_t* triangles = ...;\n\n\u002F\u002F automatically deduces `vertex_size=2` and `triangle_size=3` based on pointer types\nint res = meshopt_decodeMeshlet(vertices, m.vertex_count, triangles, m.triangle_count, stream, encoded_size);\nassert(res == 0);\n```\n\nVertex index references can be decoded as either 16-bit or 32-bit integers; triangle data can be decoded as 3 bytes per triangle (matching `meshopt_buildMeshlets` output format) or as a 32-bit integer per triangle (with indices packed as `a | (b \u003C\u003C 8) | (c \u003C\u003C 16)` and top byte unused). Output buffers must have available space aligned to 4 bytes; for example, decoding a 3-triangle stream using 3 bytes per triangle needs to be able to write 12 bytes to the output triangles array.\n\nWhen using the C++ API, `meshopt_decodeMeshlet` will automatically deduce the element sizes based on the types of vertex and triangle pointers; when using the C API, the sizes need to be specified explicitly.\n\nDecoder is heavily optimized and can directly target write-combined memory; you can expect it to run at 7-10 GB\u002Fs on modern desktop CPUs.\n\n> Applications that do most of the streaming decompression on the GPU can also decode meshlet data on the GPU if CPU decoding is inconvenient; an example [meshletdec.slang](.\u002Fdemo\u002Fmeshletdec.slang) shader is provided for 32-bit output format, and can be easily adapted to other formats, including custom ones.\n\nNote that meshlet encoding assumes that the meshlet data was optimized; meshlets should be processed using `meshopt_optimizeMeshletLevel` with level 1 or higher (3 recommended for improved compression) before encoding. Additionally, vertex references should have a high degree of reference locality; this can be achieved by building meshlets from meshes optimized for vertex cache\u002Ffetch, or linearizing the vertex reference data and reordering the vertex buffer using `meshopt_optimizeVertexFetch`. Feeding unoptimized data into the encoder will result in poor compression ratios. Codec preserves the order of triangles, however it can rotate each triangle to improve compression ratio (which means the provoking vertex may change).\n\nMeshlets without vertex references are supported; passing `NULL` vertices and `0` vertex count during encoding and decoding will produce encoded meshlets with just triangle data. Note that parameters supplied during decoding must match those used during encoding; if a meshlet was encoded with vertex references, it must be decoded with the same number of vertex references.\n\nThe meshlet codec targets 5-7 bits per triangle for triangle data; when vertex references are encoded, the encoded size strongly depends on how linear the references are, but it's typical to see 9-12 bits per triangle in aggregate. To reduce the compressed size further, it's possible to compress the resulting encoded data with a general purpose compressor, which usually achieves 5-8 bits\u002Ftriangle in aggregate; note that in this case general purpose compressors should be applied to a stream with many encoded meshlets at once to amortize their overhead.\n\n### Point cloud compression\n\nThe vertex encoding algorithms can be used to compress arbitrary streams of attribute data; one other use case besides triangle meshes is point cloud data. Typically point clouds come with position, color and possibly other attributes but don't have an implied point order.\n\nTo compress point clouds efficiently, it's recommended to first preprocess the points by sorting them using the spatial sort algorithm:\n\n```c++\nstd::vector\u003Cunsigned int> remap(point_count);\nmeshopt_spatialSortRemap(&remap[0], positions, point_count, sizeof(vec3));\n\n\u002F\u002F for each attribute stream\nmeshopt_remapVertexBuffer(positions, positions, point_count, sizeof(vec3), &remap[0]);\n```\n\nAfter this the resulting arrays should be quantized (e.g. using 16-bit fixed point numbers for positions and 8-bit color components), and the result can be compressed using `meshopt_encodeVertexBuffer` as described in the previous section. To decompress, `meshopt_decodeVertexBuffer` will recover the quantized data that can be used directly or converted back to original floating-point data. The compression ratio depends on the nature of source data, for colored points it's typical to get 35-40 bits per point.\n\n### Vertex filters\n\nTo further leverage the inherent structure of some vertex data, it's possible to use filters that encode and decode the data in a lossy manner. This is similar to quantization but can be used without having to change the shader code. After decoding, the filter transformation needs to be reversed. For native game engine pipelines, it is usually more optimal to carefully prequantize and pretransform the vertex data, but sometimes (for example when serializing data in glTF format) this is not a practical option and filters are more convenient. This library provides four filters:\n\n- Octahedral filter (`meshopt_encodeFilterOct`\u002F`meshopt_decodeFilterOct`) encodes quantized (snorm) normal or tangent vectors using octahedral encoding. Any number of bits between 2 and 16 can be used with 4 bytes or 8 bytes per vector.\n- Quaternion filter (`meshopt_encodeFilterQuat`\u002F`meshopt_decodeFilterQuat`) encodes quantized (snorm) quaternion vectors; this can be used to encode rotations or tangent frames. Any number of bits between 4 and 16 can be used with 8 bytes per vector.\n- Exponential filter (`meshopt_encodeFilterExp`\u002F`meshopt_decodeFilterExp`) encodes single-precision floating-point vectors; this can be used to encode arbitrary floating-point data more efficiently. In addition to an arbitrary bit count (\u003C= 24), the filter takes a \"mode\" parameter that allows specifying how the exponent sharing is performed to trade off compression ratio and quality:\n    - `meshopt_EncodeExpSeparate` does not share exponents and results in the largest output\n    - `meshopt_EncodeExpSharedVector` shares exponents between different components of the same vector\n    - `meshopt_EncodeExpSharedComponent` shares exponents between the same component in different vectors\n    - `meshopt_EncodeExpClamped` does not share exponents but clamps the exponent range to reduce exponent entropy\n- Color filter (`meshopt_encodeFilterColor`\u002F`meshopt_decodeFilterColor`) encodes quantized (unorm) RGBA colors using YCoCg encoding. Any number of bits between 2 and 16 can be used with 4 bytes or 8 bytes per vector.\n\nNote that all filters are lossy and require the data to be deinterleaved with one attribute per stream; this facilitates efficient SIMD implementation of filter decoders, which decodes at 5-10 GB\u002Fs on modern desktop CPUs, allowing the overall decompression speed to be closer to that of the raw vertex codec.\n\n### Versioning and compatibility\n\nThe following guarantees on data compatibility are provided for point releases (*no* guarantees are given for development branch):\n\n- Data encoded with older versions of the library can always be decoded with newer versions;\n- Data encoded with newer versions of the library can be decoded with older versions, provided that encoding versions are set correctly; if binary stability of encoded data is important, use `meshopt_encodeVertexVersion` and `meshopt_encodeIndexVersion` to 'pin' the data versions (or `version` argument of `meshopt_encodeVertexBufferLevel`).\n\nBy default, vertex data is encoded for format version 1 (compatible with meshoptimizer v0.23+), and index data is encoded for format version 1 (compatible with meshoptimizer v0.14+). When decoding the data, the decoder will automatically detect the version from the data header.\n\n## Simplification\n\nAll algorithms presented so far don't affect visual appearance at all, with the exception of quantization that has minimal controlled impact. However, fundamentally the most effective way to reduce the rendering or transmission cost of a mesh is to reduce the number of triangles in the mesh.\n\n### Basic simplification\n\nThis library provides a simplification algorithm, `meshopt_simplify`, that reduces the number of triangles in the mesh. Given a vertex and an index buffer, it generates a second index buffer that uses existing vertices in the vertex buffer. This index buffer can be used directly for rendering with the original vertex buffer (preferably after vertex cache optimization using `meshopt_optimizeVertexCache`), or a new compact vertex\u002Findex buffer can be generated using `meshopt_optimizeVertexFetch` that uses the optimal number and order of vertices.\n\n```c++\nfloat threshold = 0.2f;\nsize_t target_index_count = size_t(index_count * threshold);\nfloat target_error = 1e-2f;\n\nstd::vector\u003Cunsigned int> lod(index_count);\nfloat lod_error = 0.f;\nlod.resize(meshopt_simplify(&lod[0], indices, index_count, &vertices[0].x, vertex_count, sizeof(Vertex),\n    target_index_count, target_error, \u002F* options= *\u002F 0, &lod_error));\n```\n\nTarget error is an approximate measure of the deviation from the original mesh using distance normalized to `[0..1]` range (e.g. `1e-2f` means that simplifier will try to maintain the error to be below 1% of the mesh extents). Note that the simplifier attempts to produce the requested number of indices at minimal error, but because of topological restrictions and error limit it is not guaranteed to reach the target index count and can stop earlier.\n\nTo disable the error limit, `target_error` can be set to `FLT_MAX`. This makes it more likely that the simplifier will reach the target index count, but it may produce a mesh that looks significantly different from the original, so using the resulting error to control viewing distance would be required. Conversely, setting `target_index_count` to 0 will simplify the input mesh as much as possible within the specified error limit; this can be useful for generating LODs that should look good at a given viewing distance.\n\nThe algorithm follows the topology of the original mesh in an attempt to preserve attribute seams, borders and overall appearance. For meshes with inconsistent topology or many seams, such as faceted meshes, it can result in simplifier getting \"stuck\" and not being able to simplify the mesh fully. Therefore it's critical that identical vertices are \"welded\" together, that is, the input vertex buffer does not contain duplicates. Additionally, it may be worthwhile to weld the vertices without taking into account vertex attributes that aren't critical and can be rebuilt later, or use \"permissive\" mode described below.\n\nAlternatively, the library provides another simplification algorithm, `meshopt_simplifySloppy`, which doesn't follow the topology of the original mesh. This means that it doesn't preserve attribute seams or borders, but it can collapse internal details that are too small to matter because it can merge mesh features that are topologically disjoint but spatially close. In general, this algorithm produces meshes with worse geometric quality and poor attribute quality compared to `meshopt_simplify`.\n\nThe algorithm can also return the resulting normalized deviation that can be used to choose the correct level of detail based on screen size or solid angle; the error can be converted to object space by multiplying by the scaling factor returned by `meshopt_simplifyScale`. For example, given a mesh with a precomputed LOD and a prescaled error, the screen-space normalized error can be computed and used for LOD selection:\n\n```c++\n\u002F\u002F lod_factor can be 1 or can be adjusted for more or less aggressive LOD selection\nfloat d = max(0, distance(camera_position, mesh_center) - mesh_radius);\nfloat e = d * (tan(camera_fovy \u002F 2) * 2 \u002F screen_height); \u002F\u002F 1px in mesh space\nbool lod_ok = e * lod_factor >= lod_error;\n```\n\nWhen a sequence of LOD meshes is generated that all use the original vertex buffer, care must be taken to order vertices optimally to not penalize mobile GPU architectures that are only capable of transforming a sequential vertex buffer range. It's recommended in this case to first optimize each LOD for vertex cache, then assemble all LODs in one large index buffer starting from the coarsest LOD (the one with fewest triangles), and call `meshopt_optimizeVertexFetch` on the final large index buffer. This will make sure that coarser LODs require a smaller vertex range and are efficient with respect to vertex fetch and transform.\n\n### Attribute-aware simplification\n\nWhile `meshopt_simplify` is aware of attribute discontinuities by default (and infers them through the supplied index buffer) and tries to preserve them, it can be useful to provide information about attribute values. This allows the simplifier to take attribute error into account which can improve shading (by using vertex normals), texture deformation (by using texture coordinates), and may be necessary to preserve vertex colors when textures are not used in the first place. This can be done by using a variant of the simplification function that takes attribute values and weight factors, `meshopt_simplifyWithAttributes`:\n\n```c++\nconst float nrm_weight = 0.5f;\nconst float attr_weights[3] = {nrm_weight, nrm_weight, nrm_weight};\n\nstd::vector\u003Cunsigned int> lod(index_count);\nfloat lod_error = 0.f;\nlod.resize(meshopt_simplifyWithAttributes(&lod[0], indices, index_count, &vertices[0].x, vertex_count, sizeof(Vertex),\n    &vertices[0].nx, sizeof(Vertex), attr_weights, 3, \u002F* vertex_lock= *\u002F NULL,\n    target_index_count, target_error, \u002F* options= *\u002F 0, &lod_error));\n```\n\nThe attributes are passed as a separate buffer (in the example above it's a subset of the same vertex buffer) and should be stored as consecutive floats; attribute weights are used to control the importance of each attribute in the simplification process. For normalized attributes like normals and vertex colors, a weight around 1.0 is usually appropriate; internally, a change of `1\u002Fweight` in attribute value over a distance `d` is approximately equivalent to a change of `d` in position. Using higher weights may be appropriate to preserve attribute quality at the cost of position quality. If the attribute has a different scale (e.g. unnormalized vertex colors in [0..255] range), the weight should be divided by the scaling factor (1\u002F255 in this example).\n\nIncluding texture coordinates in the attribute set is optional, as simplification generally preserves texture quality reasonably well by default; if included, a weight of around 10-100 is usually appropriate depending on the UV density. It's also possible to compute the weight automatically by setting it to the reciprocal average density of UVs, which can be computed as `1\u002Fsqrt(average UV area)` = `1\u002Fsqrt(sum(abs(uv area)) \u002F triangle count)` over all triangles in the mesh, possibly scaled by a constant factor if necessary.\n\nBoth the target error and the resulting error combine positional error and attribute error, so the error can be used to control the LOD while taking attribute quality into account, assuming carefully chosen weights.\n\n### Permissive simplification\n\nBy default, `meshopt_simplify` preserves attribute discontinuities inferred from the supplied index buffer. For meshes with many seams, the simplifier can get \"stuck\" and fail to fully simplify the mesh, as it cannot collapse vertices across attribute seams. This is especially problematic for meshes with faceted normals (flat shading), as the simplifier may not be able to reduce the triangle count at all. The `meshopt_SimplifyPermissive` option relaxes these restrictions, allowing the simplifier to collapse vertices across attribute discontinuities when the resulting error is acceptable:\n\n```c++\nstd::vector\u003Cunsigned int> lod(index_count);\nfloat lod_error = 0.f;\nlod.resize(meshopt_simplifyWithAttributes(&lod[0], indices, index_count, &vertices[0].x, vertex_count, sizeof(Vertex),\n    &vertices[0].nx, sizeof(Vertex), attr_weights, 3, \u002F* vertex_lock= *\u002F NULL,\n    target_index_count, target_error, \u002F* options= *\u002F meshopt_SimplifyPermissive, &lod_error));\n```\n\nTo maintain appearance, it's highly recommended to use this option together with attribute-aware simplification, as shown above, as it allows the simplifier to maintain attribute quality. In this mode, it is often desirable to selectively preserve certain attribute seams, such as UV seams or sharp creases. This can be achieved by using the `vertex_lock` array with flag `meshopt_SimplifyVertex_Protect` set for individual vertices to protect specific discontinuities. To fill this array, use `meshopt_generatePositionRemap` to create a mapping table for vertices with identical positions, and then compare each vertex to the remapped vertex to determine which attributes are different:\n\n```c++\nstd::vector\u003Cunsigned int> remap(vertices.size());\nmeshopt_generatePositionRemap(&remap[0], &vertices[0].px, vertices.size(), sizeof(Vertex));\n\nstd::vector\u003Cunsigned char> locks(vertices.size());\nfor (size_t i = 0; i \u003C vertices.size(); ++i) {\n    unsigned int r = remap[i];\n\n    if (r != i && (vertices[r].tx != vertices[i].tx || vertices[r].ty != vertices[i].ty))\n        locks[i] |= meshopt_SimplifyVertex_Protect; \u002F\u002F protect UV seams\n}\n```\n\nThis approach provides fine-grained control over which discontinuities to preserve. The permissive mode combined with selective locking provides a balance between simplification quality and attribute preservation, and usually results in higher quality LODs for the same target triangle count (and dramatically higher quality compared to `meshopt_simplifySloppy`).\n\n### Simplification with vertex update\n\nAll simplification functions described so far reuse the original vertex buffer and only produce a new index buffer. This means that the resulting mesh will have the same vertex positions and attributes as the original mesh; this is optimal for minimizing the memory consumption and for highly detailed meshes often provides good quality. However, for more aggressive simplification to retain visual quality, it may be necessary to adjust vertex data for optimal appearance. This can be done by using a variant of the simplification function that updates vertex positions and attributes, `meshopt_simplifyWithUpdate`:\n\n```c++\nindices.resize(meshopt_simplifyWithUpdate(&indices[0], indices.size(), &vertices[0].px, vertices.size(), sizeof(Vertex),\n    &vertices[0].nx, sizeof(Vertex), attr_weights, 3, \u002F* vertex_lock= *\u002F NULL,\n    target_index_count, target_error, \u002F* options= *\u002F 0, &result_error));\n```\n\nUnlike `meshopt_simplify`\u002F`meshopt_simplifyWithAttributes`, this function updates the index buffer as well as vertex positions and attributes in place. The resulting indices still refer to the original vertex buffer; any attributes that are not passed to the simplifier can be left unchanged. However, since the original contents of `vertices` is no longer valid for rendering the original mesh, a new compact vertex\u002Findex buffer should be generated using `meshopt_optimizeVertexFetch` (after optimizing the index data with `meshopt_optimizeVertexCache`). If the original data was important, it should be copied before calling this function.\n\nSince the vertex positions are updated, this may require updating some attributes that could previously be left as-is when using the original vertex buffer. Notably, texture coordinates need to be updated to avoid texture distortion; thus it's highly recommended to include texture coordinates in the attribute data passed to the simplifier. For attributes to be updated, the corresponding attribute weight must not be zero; for texture coordinates, a weight of 1.0 is usually sufficient in this case (although a higher or mesh dependent weight could be used with this function or other functions to reduce UV stretching).\n\nAttributes that have specific constraints like normals and colors should be renormalized or clamped after the function returns new data. Attributes like bone indices\u002Fweights don't have to be updated for reasonable results (but regularization via `meshopt_SimplifyRegularize` may still be helpful to maintain deformation quality). If bone weights *are* provided as attributes, they will need to be clamped and renormalized after the update to ensure they continue to add up to 1.\n\nUsing unique vertex data for each LOD in a chain can improve visual quality, but it comes at a cost of ~doubling vertex memory used (if each LOD is using half the triangles of the previous LOD). To reduce the memory footprint, it is possible to use shared vertices with `meshopt_simplifyWithAttributes` for the first one or two LODs in the chain, and only switch to `meshopt_simplifyWithUpdate` for the remainder. In that case, similarly to the use of `meshopt_simplify` described earlier, care must be taken to optimally arrange the vertices in the original vertex buffer.\n\n### Advanced simplification\n\n`meshopt_simplify*` functions expose additional options and parameters that can be used to control the simplification process in more detail.\n\nFor basic customization, a number of options can be passed via `options` bitmask that adjust the behavior of the simplifier:\n\n- `meshopt_SimplifyLockBorder` restricts the simplifier from collapsing edges that are on the border of the mesh. This can be useful for simplifying mesh subsets independently, so that the LODs can be combined without introducing cracks.\n- `meshopt_SimplifyErrorAbsolute` changes the error metric from relative to absolute both for the input error limit as well as for the resulting error. This can be used instead of `meshopt_simplifyScale`.\n- `meshopt_SimplifySparse` improves simplification performance assuming input indices are a sparse subset of the mesh. This can be useful when simplifying small mesh subsets independently, and is intended to be used for meshlet simplification. For consistency, it is recommended to use absolute errors when sparse simplification is desired, as this flag changes the meaning of the relative errors.\n- `meshopt_SimplifyPrune` allows the simplifier to remove isolated components regardless of the topological restrictions inside the component. This is generally recommended for full-mesh simplification as it can improve quality and reduce triangle count; note that with this option, triangles connected to locked vertices may be removed as part of their component.\n- `meshopt_SimplifyRegularize` produces more regular triangle sizes and shapes during simplification, at some cost to geometric quality. This can improve geometric quality under deformation such as skinning. `meshopt_SimplifyRegularizeLight` can be used instead of this flag to use a smaller regularization factor, reducing the impact on geometric quality.\n- `meshopt_SimplifyPermissive` allows collapses across attribute discontinuities, except for vertices that are tagged with `meshopt_SimplifyVertex_Protect` via `vertex_lock`.\n\nWhen using `meshopt_simplifyWithAttributes`, it is also possible to lock certain vertices by providing a `vertex_lock` array that contains a value for each vertex in the mesh, with `meshopt_SimplifyVertex_Lock` set for vertices that should not be collapsed. This can be useful to preserve certain vertices, such as the boundary of the mesh, with more control than `meshopt_SimplifyLockBorder` option provides. When using `meshopt_simplifyWithUpdate`, locking vertices (whether via `vertex_lock` or `meshopt_SimplifyLockBorder`) will also prevent the simplifier from updating their positions and attributes; this can be useful together with `meshopt_SimplifySparse` for meshlet simplification, as meshlets at one level of hierarchy can be simplified together without excessive data copying.\n\nLocking vertices restricts simplification and makes it more likely that the simplifier gets stuck before reaching the index target; if some areas of the mesh are more important than others but should still be eligible for simplification, `vertex_lock` array can be used to mark specific vertices as high priority using `meshopt_SimplifyVertex_Priority` bit, which makes it more likely that the vertex will be preserved during simplification.\n\nIn addition to the `meshopt_SimplifyPrune` flag, you can explicitly prune isolated components by calling the `meshopt_simplifyPrune` function. This can be done before regular simplification or as the only step, which is useful for scenarios like isosurface cleanup. Similar to other simplification functions, the `target_error` argument controls the cutoff of component radius and is specified in relative units (e.g., `1e-2f` will remove components under 1%). If an absolute cutoff is desired, divide the parameter by the factor returned by `meshopt_simplifyScale`.\n\nSimplification currently assumes that the input mesh is using the same material for all triangles. If the mesh uses multiple materials, it is possible to split the mesh into subsets based on the material and simplify each subset independently, using `meshopt_SimplifyLockBorder` or `vertex_lock` to preserve material boundaries; however, this limits the collapses and may reduce the resulting quality. An alte","meshoptimizer 是一个用于优化3D网格数据的库，旨在减少网格大小并提高渲染速度。它提供了包括顶点和索引数据优化、网格简化与压缩等核心功能，通过这些算法可以有效提升GPU处理效率。该库采用C++编写，并提供C\u002FC++接口以及通过FFI支持其他语言调用，适用于需要高效处理3D模型的应用场景，如游戏开发、虚拟现实或任何涉及大量3D图形渲染的项目中。此外，还附带了gltfpack工具，可用于自动优化glTF格式文件，进一步增强了其实用性。",2,"2026-06-11 03:30:47","trending"]