[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-3788":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":16,"starSnapshotCount":16,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},3788,"web-llm","mlc-ai\u002Fweb-llm","mlc-ai","High-performance In-browser LLM Inference Engine ","https:\u002F\u002Fwebllm.mlc.ai",null,"TypeScript",18166,1307,137,130,0,7,50,208,33,44.35,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32,33],"chatgpt","deep-learning","language-model","llm","tvm","webgpu","webml","2026-06-12 02:00:54","\u003Cdiv align=\"center\" id=\"top\">\n\n# WebLLM\n\n[![NPM Package](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNPM_Package-Published-cc3534)](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@mlc-ai\u002Fweb-llm)\n[![\"WebLLM Chat Deployed\"](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebLLM_Chat-Deployed-%2332a852)](https:\u002F\u002Fchat.webllm.ai\u002F)\n[![Join Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJoin-Discord-7289DA?logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002F9Xpy2HGBuD)\n[![Related Repository: WebLLM Chat](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRelated_Repo-WebLLM_Chat-fafbfc?logo=github)](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm-chat\u002F)\n[![Related Repository: MLC LLM](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRelated_Repo-MLC_LLM-fafbfc?logo=github)](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fmlc-llm\u002F)\n\n**High-Performance In-Browser LLM Inference Engine.**\n\n[Documentation](https:\u002F\u002Fwebllm.mlc.ai\u002Fdocs\u002F) | [Blogpost](https:\u002F\u002Fblog.mlc.ai\u002F2024\u002F06\u002F13\u002Fwebllm-a-high-performance-in-browser-llm-inference-engine) | [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.15803) | [Examples](examples)\n\n\u003C\u002Fdiv>\n\n## Overview\n\nWebLLM is a high-performance in-browser LLM inference engine that brings language model inference directly onto web browsers with hardware acceleration.\nEverything runs inside the browser with no server support and is accelerated with WebGPU.\n\nWebLLM is **fully compatible with [OpenAI API](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fapi-reference\u002Fchat).**\nThat is, you can use the same OpenAI API on **any open source models** locally, with functionalities\nincluding streaming, JSON-mode, function-calling (WIP), etc.\n\nWe can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.\n\nYou can use WebLLM as a base [npm package](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@mlc-ai\u002Fweb-llm) and build your own web application on top of it by following the examples below. This project is a companion project of [MLC LLM](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fmlc-llm), which enables universal deployment of LLM across hardware environments.\n\n\u003Cdiv align=\"center\">\n\n**[Check out WebLLM Chat to try it out!](https:\u002F\u002Fchat.webllm.ai\u002F)**\n\n\u003C\u002Fdiv>\n\n## Key Features\n\n- **In-Browser Inference**: WebLLM is a high-performance, in-browser language model inference engine that leverages WebGPU for hardware acceleration, enabling powerful LLM operations directly within web browsers without server-side processing.\n\n- [**Full OpenAI API Compatibility**](#full-openai-compatibility): Seamlessly integrate your app with WebLLM using OpenAI API with functionalities such as streaming, JSON-mode, logit-level control, seeding, and more.\n\n- **Structured JSON Generation**: WebLLM supports state-of-the-art JSON mode structured generation, implemented in the WebAssembly portion of the model library for optimal performance. Check [WebLLM JSON Playground](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmlc-ai\u002FWebLLM-JSON-Playground) on HuggingFace to try generating JSON output with custom JSON schema.\n\n- [**Extensive Model Support**](#built-in-models): WebLLM natively supports a range of models including Llama 3, Phi 3, Gemma, Mistral, Qwen(通义千问), and many others, making it versatile for various AI tasks. For the complete supported model list, check [MLC Models](https:\u002F\u002Fmlc.ai\u002Fmodels).\n\n- [**Custom Model Integration**](#custom-models): Easily integrate and deploy custom models in MLC format, allowing you to adapt WebLLM to specific needs and scenarios, enhancing flexibility in model deployment.\n\n- **Plug-and-Play Integration**: Easily integrate WebLLM into your projects using package managers like NPM and Yarn, or directly via CDN, complete with comprehensive [examples](.\u002Fexamples\u002F) and a modular design for connecting with UI components.\n\n- **Streaming & Real-Time Interactions**: Supports streaming chat completions, allowing real-time output generation which enhances interactive applications like chatbots and virtual assistants.\n\n- **Web Worker & Service Worker Support**: Optimize UI performance and manage the lifecycle of models efficiently by offloading computations to separate worker threads or service workers.\n\n- **Chrome Extension Support**: Extend the functionality of web browsers through custom Chrome extensions using WebLLM, with examples available for building both basic and advanced extensions.\n\n## Built-in Models\n\nCheck the complete list of available models on [MLC Models](https:\u002F\u002Fmlc.ai\u002Fmodels). WebLLM supports a subset of these available models and the list can be accessed at [`prebuiltAppConfig.model_list`](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm\u002Fblob\u002Fmain\u002Fsrc\u002Fconfig.ts#L293).\n\nHere are the primary families of models currently supported:\n\n- **Llama**: Llama 3, Llama 2, Hermes-2-Pro-Llama-3\n- **Phi**: Phi 3, Phi 2, Phi 1.5\n- **Gemma**: Gemma-2B\n- **Mistral**: Mistral-7B-v0.3, Hermes-2-Pro-Mistral-7B, NeuralHermes-2.5-Mistral-7B, OpenHermes-2.5-Mistral-7B\n- **Qwen (通义千问)**: Qwen2 0.5B, 1.5B, 7B\n\nIf you need more models, [request a new model via opening an issue](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm\u002Fissues\u002Fnew\u002Fchoose) or check [Custom Models](#custom-models) for how to compile and use your own models with WebLLM.\n\n## Jumpstart with Examples\n\nLearn how to use WebLLM to integrate large language models into your application and generate chat completions through this simple Chatbot example:\n\n[![Example Chatbot on JSFiddle](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FExample-JSFiddle-blue?logo=jsfiddle&logoColor=white)](https:\u002F\u002Fjsfiddle.net\u002Fneetnestor\u002F4nmgvsa2\u002F)\n[![Example Chatbot on Codepen](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FExample-Codepen-gainsboro?logo=codepen)](https:\u002F\u002Fcodepen.io\u002Fneetnestor\u002Fpen\u002FvYwgZaG)\n\nFor an advanced example of a larger, more complicated project, check [WebLLM Chat](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm-chat\u002Fblob\u002Fmain\u002Fapp\u002Fclient\u002Fwebllm.ts).\n\nMore examples for different use cases are available in the [examples](.\u002Fexamples\u002F) folder.\n\n## Get Started\n\nWebLLM offers a minimalist and modular interface to access the chatbot in the browser.\nThe package is designed in a modular way to hook to any of the UI components.\n\n### Installation\n\n#### Package Manager\n\n```sh\n# npm\nnpm install @mlc-ai\u002Fweb-llm\n# yarn\nyarn add @mlc-ai\u002Fweb-llm\n# or pnpm\npnpm install @mlc-ai\u002Fweb-llm\n```\n\nThen import the module in your code.\n\n```typescript\n\u002F\u002F Import everything\nimport * as webllm from \"@mlc-ai\u002Fweb-llm\";\n\u002F\u002F Or only import what you need\nimport { CreateMLCEngine } from \"@mlc-ai\u002Fweb-llm\";\n```\n\n#### CDN Delivery\n\nThanks to [jsdelivr.com](https:\u002F\u002Fwww.jsdelivr.com\u002Fpackage\u002Fnpm\u002F@mlc-ai\u002Fweb-llm), WebLLM can be imported directly through URL and work out-of-the-box on cloud development platforms like [jsfiddle.net](https:\u002F\u002Fjsfiddle.net\u002F), [Codepen.io](https:\u002F\u002Fcodepen.io\u002F), and [Scribbler](https:\u002F\u002Fscribbler.live):\n\n```javascript\nimport * as webllm from \"https:\u002F\u002Fesm.run\u002F@mlc-ai\u002Fweb-llm\";\n```\n\nIt can also be dynamically imported as:\n\n```javascript\nconst webllm = await import(\"https:\u002F\u002Fesm.run\u002F@mlc-ai\u002Fweb-llm\");\n```\n\n### Create MLCEngine\n\nMost operations in WebLLM are invoked through the `MLCEngine` interface. You can create an `MLCEngine` instance and loading the model by calling the `CreateMLCEngine()` factory function.\n\n(Note that loading models requires downloading and it can take a significant amount of time for the very first run without caching previously. You should properly handle this asynchronous call.)\n\n```typescript\nimport { CreateMLCEngine } from \"@mlc-ai\u002Fweb-llm\";\n\n\u002F\u002F Callback function to update model loading progress\nconst initProgressCallback = (initProgress) => {\n  console.log(initProgress);\n};\nconst selectedModel = \"Llama-3.1-8B-Instruct-q4f32_1-MLC\";\n\nconst engine = await CreateMLCEngine(\n  selectedModel,\n  { initProgressCallback: initProgressCallback }, \u002F\u002F engineConfig\n);\n```\n\nUnder the hood, this factory function does the following steps for first creating an engine instance (synchronous) and then loading the model (asynchronous). You can also do them separately in your application.\n\n```typescript\nimport { MLCEngine } from \"@mlc-ai\u002Fweb-llm\";\n\n\u002F\u002F This is a synchronous call that returns immediately\nconst engine = new MLCEngine({\n  initProgressCallback: initProgressCallback,\n});\n\n\u002F\u002F This is an asynchronous call and can take a long time to finish\nawait engine.reload(selectedModel);\n```\n\n### Cache Backend Policy\n\nWebLLM supports four cache backends through `AppConfig.cacheBackend`:\n\n- `\"cache\"`: browser [Cache API](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FCache) (default).\n- `\"indexeddb\"`: browser [IndexedDB](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FIndexedDB_API).\n- `\"opfs\"`: browser [Origin Private File System (OPFS)](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FFile_System_API\u002FOrigin_private_file_system).\n- `\"cross-origin\"`: experimental Chrome [Cross-Origin Storage API](https:\u002F\u002Fgithub.com\u002FWICG\u002Fcross-origin-storage) extension backend. Install the [Cross-Origin Storage extension](https:\u002F\u002Fchromewebstore.google.com\u002Fdetail\u002Fcross-origin-storage\u002Fdenpnpcgjgikjpoglpjefakmdcbmlgih) to use it. (If the extension isn't installed, WebLLM falls back to the default cache automatically.)\n\nExample:\n\n```typescript\nimport { CreateMLCEngine, prebuiltAppConfig } from \"@mlc-ai\u002Fweb-llm\";\n\nconst appConfig = { ...prebuiltAppConfig, cacheBackend: \"cross-origin\" };\nconst engine = await CreateMLCEngine(\"Llama-3.1-8B-Instruct-q4f32_1-MLC\", {\n  appConfig,\n});\n```\n\nNotes:\n- If `\"opfs\"` is selected in an environment without OPFS support, cache operations fail with an OPFS availability error.\n- The `\"cross-origin\"` backend requires installing and enabling a compatible browser extension.\n- Cross-origin backend currently does not support programmatic tensor-cache deletion; clearing is extension-managed.\n\n### Chat Completion\n\nAfter successfully initializing the engine, you can now invoke chat completions using OpenAI style chat APIs through the `engine.chat.completions` interface. For the full list of parameters and their descriptions, check [section below](#full-openai-compatibility) and [OpenAI API reference](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fapi-reference\u002Fchat\u002Fcreate).\n\n(Note: The `model` parameter is not supported and will be ignored here. Instead, call `CreateMLCEngine(model)` or `engine.reload(model)` instead as shown in the [Create MLCEngine](#create-mlcengine) above.)\n\n```typescript\nconst messages = [\n  { role: \"system\", content: \"You are a helpful AI assistant.\" },\n  { role: \"user\", content: \"Hello!\" },\n];\n\nconst reply = await engine.chat.completions.create({\n  messages,\n});\nconsole.log(reply.choices[0].message);\nconsole.log(reply.usage);\n```\n\n### Streaming\n\nWebLLM also supports streaming chat completion generating. To use it, simply pass `stream: true` to the `engine.chat.completions.create` call.\n\n```typescript\nconst messages = [\n  { role: \"system\", content: \"You are a helpful AI assistant.\" },\n  { role: \"user\", content: \"Hello!\" },\n];\n\n\u002F\u002F Chunks is an AsyncGenerator object\nconst chunks = await engine.chat.completions.create({\n  messages,\n  temperature: 1,\n  stream: true, \u002F\u002F \u003C-- Enable streaming\n  stream_options: { include_usage: true },\n});\n\nlet reply = \"\";\nfor await (const chunk of chunks) {\n  reply += chunk.choices[0]?.delta.content || \"\";\n  console.log(reply);\n  if (chunk.usage) {\n    console.log(chunk.usage); \u002F\u002F only last chunk has usage\n  }\n}\n\nconst fullReply = await engine.getMessage();\nconsole.log(fullReply);\n```\n\n## Advanced Usage\n\n### Using Workers\n\nYou can put the heavy computation in a worker script to optimize your application performance. To do so, you need to:\n\n1. Create a handler in the worker thread that communicates with the frontend while handling the requests.\n2. Create a Worker Engine in your main application, which under the hood sends messages to the handler in the worker thread.\n\nFor detailed implementations of different kinds of Workers, check the following sections.\n\n#### Dedicated Web Worker\n\nWebLLM comes with API support for WebWorker so you can hook\nthe generation process into a separate worker thread so that\nthe computing in the worker thread won't disrupt the UI.\n\nWe create a handler in the worker thread that communicates with the frontend while handling the requests.\n\n```typescript\n\u002F\u002F worker.ts\nimport { WebWorkerMLCEngineHandler } from \"@mlc-ai\u002Fweb-llm\";\n\n\u002F\u002F A handler that resides in the worker thread\nconst handler = new WebWorkerMLCEngineHandler();\nself.onmessage = (msg: MessageEvent) => {\n  handler.onmessage(msg);\n};\n```\n\nIn the main logic, we create a `WebWorkerMLCEngine` that\nimplements the same `MLCEngineInterface`. The rest of the logic remains the same.\n\n```typescript\n\u002F\u002F main.ts\nimport { CreateWebWorkerMLCEngine } from \"@mlc-ai\u002Fweb-llm\";\n\nasync function main() {\n  \u002F\u002F Use a WebWorkerMLCEngine instead of MLCEngine here\n  const engine = await CreateWebWorkerMLCEngine(\n    new Worker(new URL(\".\u002Fworker.ts\", import.meta.url), {\n      type: \"module\",\n    }),\n    selectedModel,\n    { initProgressCallback }, \u002F\u002F engineConfig\n  );\n\n  \u002F\u002F everything else remains the same\n}\n```\n\n### Use Service Worker\n\nWebLLM comes with API support for ServiceWorker so you can hook the generation process\ninto a service worker to avoid reloading the model in every page visit and optimize\nyour application's offline experience.\n\n(Note, Service Worker's life cycle is managed by the browser and can be killed any time without notifying the webapp. `ServiceWorkerMLCEngine` will try to keep the service worker thread alive by periodically sending heartbeat events, but your application should also include proper error handling. Check `keepAliveMs` and `missedHeatbeat` in [`ServiceWorkerMLCEngine`](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm\u002Fblob\u002Fmain\u002Fsrc\u002Fservice_worker.ts#L234) for more details.)\n\nWe create a handler in the worker thread that communicates with the frontend while handling the requests.\n\n```typescript\n\u002F\u002F sw.ts\nimport { ServiceWorkerMLCEngineHandler } from \"@mlc-ai\u002Fweb-llm\";\n\nlet handler: ServiceWorkerMLCEngineHandler;\n\nself.addEventListener(\"activate\", function (event) {\n  handler = new ServiceWorkerMLCEngineHandler();\n  console.log(\"Service Worker is ready\");\n});\n```\n\nThen in the main logic, we register the service worker and create the engine using\n`CreateServiceWorkerMLCEngine` function. The rest of the logic remains the same.\n\n```typescript\n\u002F\u002F main.ts\nimport {\n  MLCEngineInterface,\n  CreateServiceWorkerMLCEngine,\n} from \"@mlc-ai\u002Fweb-llm\";\n\nif (\"serviceWorker\" in navigator) {\n  navigator.serviceWorker.register(\n    new URL(\"sw.ts\", import.meta.url), \u002F\u002F worker script\n    { type: \"module\" },\n  );\n}\n\nconst engine: MLCEngineInterface = await CreateServiceWorkerMLCEngine(\n  selectedModel,\n  { initProgressCallback }, \u002F\u002F engineConfig\n);\n```\n\nYou can find a complete example on how to run WebLLM in service worker in [examples\u002Fservice-worker](examples\u002Fservice-worker\u002F).\n\n### Chrome Extension\n\nYou can also find examples of building Chrome extension with WebLLM in [examples\u002Fchrome-extension](examples\u002Fchrome-extension\u002F) and [examples\u002Fchrome-extension-webgpu-service-worker](examples\u002Fchrome-extension-webgpu-service-worker\u002F). The latter one leverages service worker, so the extension is persistent in the background. Additionally, you can explore another full project of a Chrome extension, WebLLM Assistant, which leverages WebLLM [here](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm-assistant).\n\n## Full OpenAI Compatibility\n\nWebLLM is designed to be fully compatible with [OpenAI API](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fapi-reference\u002Fchat). Thus, besides building a simple chatbot, you can also have the following functionalities with WebLLM:\n\n- [streaming](examples\u002Fstreaming): return output as chunks in real-time in the form of an AsyncGenerator\n- [json-mode](examples\u002Fjson-mode): efficiently ensure output is in JSON format, see [OpenAI Reference](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fguides\u002Ftext-generation\u002Fchat-completions-api) for more.\n- [seed-to-reproduce](examples\u002Fseed-to-reproduce): use seeding to ensure a reproducible output with fields `seed`.\n- [function-calling](examples\u002Ffunction-calling) (WIP): function calling with fields `tools` and `tool_choice` (with preliminary support); or manual function calling without `tools` or `tool_choice` (keeps the most flexibility).\n\n## Integrity Verification\n\nWebLLM supports optional integrity verification for model artifacts using\n[SRI (Subresource Integrity)](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FSecurity\u002FSubresource_Integrity) hashes.\nWhen the `integrity` field is set on a `ModelRecord`, WebLLM will verify the downloaded config,\nWASM, and tokenizer files against the provided hashes before loading.\n\n```typescript\nimport { CreateMLCEngine } from \"@mlc-ai\u002Fweb-llm\";\n\nconst appConfig = {\n  model_list: [\n    {\n      model: \"https:\u002F\u002Fhuggingface.co\u002Fmlc-ai\u002FLlama-3.2-1B-Instruct-q4f16_1-MLC\",\n      model_id: \"Llama-3.2-1B-Instruct-q4f16_1-MLC\",\n      model_lib:\n        \"https:\u002F\u002Fraw.githubusercontent.com\u002Fuser\u002Fmodel-libs\u002Fmain\u002Fmodel.wasm\",\n      integrity: {\n        config: \"sha256-\u003Cbase64-hash-of-mlc-chat-config.json>\",\n        model_lib: \"sha256-\u003Cbase64-hash-of-wasm-file>\",\n        tokenizer: {\n          \"tokenizer.json\": \"sha256-\u003Cbase64-hash-of-tokenizer.json>\",\n        },\n        onFailure: \"error\", \u002F\u002F \"error\" (default) throws IntegrityError, \"warn\" logs and continues\n      },\n    },\n  ],\n};\n\nconst engine = await CreateMLCEngine(\"Llama-3.2-1B-Instruct-q4f16_1-MLC\", {\n  appConfig,\n});\n```\n\nYou can generate SRI hashes for model files with:\n\n```bash\n# SHA-256\nopenssl dgst -sha256 -binary \u003Cfile> | openssl base64 -A | sed 's\u002F^\u002Fsha256-\u002F'\n# SHA-384\nopenssl dgst -sha384 -binary \u003Cfile> | openssl base64 -A | sed 's\u002F^\u002Fsha384-\u002F'\n# SHA-512\nopenssl dgst -sha512 -binary \u003Cfile> | openssl base64 -A | sed 's\u002F^\u002Fsha512-\u002F'\n```\n\n> The `openssl` commands require a Unix-like shell (macOS\u002FLinux). On Windows, run `openssl` via [Git Bash](https:\u002F\u002Fgitforwindows.org\u002F) or [WSL](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fwindows\u002Fwsl\u002F).\n\nIf a hash does not match, an `IntegrityError` is thrown (or a warning is logged when `onFailure: \"warn\"`).\nAll fields in `integrity` are optional — only specified artifacts will be verified.\nWhen the `integrity` field is omitted entirely, WebLLM behaves exactly as before (no verification).\n\nSee the [integrity-verification example](examples\u002Fintegrity-verification\u002F) for a complete working demo.\n\n## Custom Models\n\nWebLLM works as a companion project of [MLC LLM](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fmlc-llm) and it supports custom models in MLC format.\nIt reuses the model artifact and builds the flow of MLC LLM. To compile and use your own models with WebLLM, please check out\n[MLC LLM document](https:\u002F\u002Fllm.mlc.ai\u002Fdocs\u002Fdeploy\u002Fwebllm.html)\non how to compile and deploy new model weights and libraries to WebLLM.\n\nHere, we go over the high-level idea. There are two elements of the WebLLM package that enable new models and weight variants.\n\n- `model`: Contains a URL to model artifacts, such as weights and meta-data.\n- `model_lib`: A URL to the web assembly library (i.e. wasm file) that contains the executables to accelerate the model computations.\n\nBoth are customizable in the WebLLM.\n\n```typescript\nimport { CreateMLCEngine } from \"@mlc-ai\u002Fweb-llm\";\n\nasync main() {\n  const appConfig = {\n    \"model_list\": [\n      {\n        \"model\": \"\u002Furl\u002Fto\u002Fmy\u002Fllama\",\n        \"model_id\": \"MyLlama-3b-v1-q4f32_0\",\n        \"model_lib\": \"\u002Furl\u002Fto\u002Fmyllama3b.wasm\",\n      }\n    ],\n  };\n  \u002F\u002F override default\n  const chatOpts = {\n    \"repetition_penalty\": 1.01\n  };\n\n  \u002F\u002F load a prebuilt model\n  \u002F\u002F with a chat option override and app config\n  \u002F\u002F under the hood, it will load the model from myLlamaUrl\n  \u002F\u002F and cache it in the browser cache\n  \u002F\u002F The chat will also load the model library from \"\u002Furl\u002Fto\u002Fmyllama3b.wasm\",\n  \u002F\u002F assuming that it is compatible to the model in myLlamaUrl.\n  const engine = await CreateMLCEngine(\n    \"MyLlama-3b-v1-q4f32_0\",\n    { appConfig }, \u002F\u002F engineConfig\n    chatOpts,\n  );\n}\n```\n\nIn many cases, we only want to supply the model weight variant, but\nnot necessarily a new model (e.g. `NeuralHermes-Mistral` can reuse `Mistral`'s\nmodel library). For examples of how a model library can be shared by different model variants,\nsee `webllm.prebuiltAppConfig`.\n\n## Build WebLLM Package From Source\n\nNOTE: you don't need to build from source unless you would like to modify the WebLLM package.\nTo use the npm, simply follow [Get Started](#get-started) or any of the [examples](examples) instead.\n\nTo build from source, simply run:\n\n```bash\nnpm install\nnpm run build\n```\n\nThen, to test the effects of your code change in an example, inside `examples\u002Fget-started\u002Fpackage.json`, change from `\"@mlc-ai\u002Fweb-llm\": \"^0.2.83\"` to `\"@mlc-ai\u002Fweb-llm\": ..\u002F..`.\n\nThen run:\n\n```bash\ncd examples\u002Fget-started\nnpm install\nnpm start\n```\n\nNote that sometimes you would need to switch between `file:..\u002F..` and `..\u002F..` to trigger npm to recognize new changes. In the worst case, you can run:\n\n```bash\ncd examples\u002Fget-started\nrm -rf node_modules dist package-lock.json .parcel-cache\nnpm install\nnpm start\n```\n\n### In case you need to build TVMjs from source\n\nWebLLM's runtime largely depends on TVMjs: https:\u002F\u002Fgithub.com\u002Fapache\u002Ftvm\u002Ftree\u002Fmain\u002Fweb\n\nWhile it is also available as an npm package: https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@mlc-ai\u002Fweb-runtime, you can build it from source if needed by following the steps below.\n\n1. Install [emscripten](https:\u002F\u002Femscripten.org). It is an LLVM-based compiler that compiles C\u002FC++ source code to WebAssembly.\n   - Follow the [installation instruction](https:\u002F\u002Femscripten.org\u002Fdocs\u002Fgetting_started\u002Fdownloads.html#installation-instructions-using-the-emsdk-recommended) to install the latest emsdk.\n   - Source `emsdk_env.sh` by `source path\u002Fto\u002Femsdk_env.sh`, so that `emcc` is reachable from PATH and the command `emcc` works.\n\n   We can verify the successful installation by trying out `emcc` terminal.\n\n   Note: We recently found that using the latest `emcc` version may run into issues during runtime. Use `.\u002Femsdk install 3.1.56` instead of `.\u002Femsdk install latest` for now as a workaround. The error may look like\n\n   ```\n   Init error, LinkError: WebAssembly.instantiate(): Import #6 module=\"wasi_snapshot_preview1\"\n   function=\"proc_exit\": function import requires a callable\n   ```\n\n2. In `.\u002Fpackage.json`, change from `\"@mlc-ai\u002Fweb-runtime\": \"0.18.0-dev2\",` to `\"@mlc-ai\u002Fweb-runtime\": \"file:.\u002Ftvm_home\u002Fweb\",`.\n\n3. Setup necessary environment\n\n   Prepare all the necessary dependencies for web build:\n\n   ```shell\n   .\u002Fscripts\u002Fprep_deps.sh\n   ```\n\n   In this step, if `$TVM_SOURCE_DIR` is not defined in the environment, we will execute the following line to build `tvmjs` dependency:\n\n   ```shell\n   git clone https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Frelax 3rdparty\u002Ftvm-unity --recursive\n   ```\n\n   This clones the current HEAD of `mlc-ai\u002Frelax`. However, it may not always be the correct branch or commit to clone. To build a specific npm version from source, refer to the version bump PR, which states which branch (i.e. `mlc-ai\u002Frelax` or `apache\u002Ftvm`) and which commit the current WebLLM version depends on. For instance, version 0.2.52, according to its version bump PR https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm\u002Fpull\u002F521, is built by checking out the following commit https:\u002F\u002Fgithub.com\u002Fapache\u002Ftvm\u002Fcommit\u002Fe6476847753c80e054719ac47bc2091c888418b6 in `apache\u002Ftvm`, rather than the HEAD of `mlc-ai\u002Frelax`.\n\n   Besides, `--recursive` is necessary and important. Otherwise, you may encounter errors like `fatal error: 'dlpack\u002Fdlpack.h' file not found`.\n\n4. Build WebLLM Package\n\n   ```shell\n   npm run build\n   ```\n\n5. Validate some of the sub-packages\n\n   You can then go to the subfolders in [examples](examples) to validate some of the sub-packages.\n   We use Parcelv2 for bundling. Although Parcel is not very good at tracking parent directory\n   changes sometimes. When you make a change in the WebLLM package, try to edit the `package.json`\n   of the subfolder and save it, which will trigger Parcel to rebuild.\n\n## Links\n\n- [Demo App: WebLLM Chat](https:\u002F\u002Fchat.webllm.ai\u002F)\n- If you want to run LLM on native runtime, check out [MLC-LLM](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fmlc-llm)\n- You might also be interested in [Web Stable Diffusion](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-stable-diffusion\u002F).\n\n## Acknowledgement\n\nThis project is initiated by members from CMU Catalyst, UW SAMPL, SJTU, OctoML, and the MLC community. We would love to continue developing and supporting the open-source ML community.\n\nThis project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities make these models accessible. We would like to thank the teams behind Vicuna, SentencePiece, LLaMA, and Alpaca. We also would like to thank the WebAssembly, Emscripten, and WebGPU communities. Finally, thanks to Dawn and WebGPU developers.\n\n## Citation\n\nIf you find this project to be useful, please cite:\n\n```\n@misc{ruan2026webllmhighperformanceinbrowserllm,\n      title={WebLLM: A High-Performance In-Browser LLM Inference Engine},\n      author={Charlie F. Ruan and Yucheng Qin and Akaash R. Parthasarathy and Xun Zhou and Ruihang Lai and Hongyi Jin and Yixin Dong and Bohan Hou and Meng-Shiun Yu and Yiyan Zhai and Sudeep Agarwal and Hangrui Cao and Siyuan Feng and Tianqi Chen},\n      year={2026},\n      eprint={2412.15803},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.15803},\n}\n```\n\n## Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg alt=\"contributors\" src=\"https:\u002F\u002Fcontrib.rocks\u002Fimage?repo=mlc-ai\u002Fweb-llm\"\u002F>\n\u003C\u002Fa>\n\n\u003Cp align=\"right\">\n  \u003Ca href=\"#top\">⬆ Back to Top ⬆\u003C\u002Fa>\n\u003C\u002Fp>\n","WebLLM是一个高性能的浏览器内语言模型推理引擎，利用WebGPU进行硬件加速，直接在浏览器中运行而无需服务器支持。其核心功能包括全OpenAI API兼容性、结构化JSON生成以及广泛的模型支持，如Llama 3等。通过WebAssembly优化性能，WebLLM能够实现流式处理、JSON模式生成等功能。适用于需要在客户端侧进行隐私保护的同时享受GPU加速的各种场景，例如构建本地AI助手或开发基于浏览器的聊天应用。",2,"2026-06-11 02:56:16","top_language"]