[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-84199":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":31,"updatedAt":32,"readmeContent":33,"aiSummary":10,"trendingCount":16,"starSnapshotCount":16,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},84199,"uv-scripts-for-ai","davanstrien\u002Fuv-scripts-for-ai","davanstrien","Self-contained UV scripts for data & ML tasks — OCR, vision, audio & more — run one in a command, locally or on Hugging Face Jobs. Built for humans and agents.","https:\u002F\u002Fhuggingface.co\u002Fuv-scripts",null,"Python",63,4,1,15,0,9,27,65.5,"Apache License 2.0",false,"main",true,[25,26,27,28,29,30],"agents","hf-jobs","huggingface","ocr","pep723","uv","2026-06-10 16:43:37","2026-06-10 20:06:28","# uv-scripts-for-ai\n\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fuv-scripts\">\u003Cpicture>\u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fresolve\u002Fmain\u002Ffollow-us-on-hf-md-dark.svg\">\u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fresolve\u002Fmain\u002Ffollow-us-on-hf-md.svg\" alt=\"Follow uv-scripts on Hugging Face\">\u003C\u002Fpicture>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdavanstrien\">\u003Cpicture>\u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fresolve\u002Fmain\u002Ffollow-me-on-HF-md-dark.svg\">\u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fbadges\u002Fresolve\u002Fmain\u002Ffollow-me-on-HF-md.svg\" alt=\"Follow davanstrien on Hugging Face\">\u003C\u002Fpicture>\u003C\u002Fa>\n\n> **A UV script is a single Python file that declares its own dependencies inline — a *portable* unit you run with `uv run` where you have the hardware, or hand to `hf jobs uv run` on [Hugging Face Jobs](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fjobs) for a GPU. Chain several into a pipeline.**\n\nEach script carries its own dependencies, so people and agents can run one without cloning a repo, making a virtualenv, or installing a `requirements.txt` first.\n\nA **recipe** here is one such script. Most read and write the [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fdatasets), so one script's output dataset becomes the next one's input.\n\n![Demo: OCR a whole image dataset with one command — no clone, no environment, straight from a Hub URL](ocr\u002Fdemo.gif)\n\n## Quickstart\n\n**First, install [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fgetting-started\u002Finstallation\u002F)** — it's the only thing you install; every script brings its own Python dependencies:\n\n```bash\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n```\n\n**Run a recipe on a GPU** — point Hugging Face Jobs at the script's URL and it runs on managed hardware, no GPU of your own needed. Here `davanstrien\u002Fufo-ColPali` is a small *public* image dataset you can use as-is; the output lands in your namespace:\n\n```bash\nhf jobs uv run --flavor l4x1 --secrets HF_TOKEN \\\n  https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Focr\u002Fraw\u002Fmain\u002Fglm-ocr.py \\\n  davanstrien\u002Fufo-ColPali your-username\u002Fufo-ocr\n```\n\nNo `pip install`, no local setup. `--secrets HF_TOKEN` forwards your token so the job can write the output dataset back to the Hub. (Jobs needs the `hf` CLI — `uv tool install huggingface_hub` — and a Hugging Face account with [pay-as-you-go credit](https:\u002F\u002Fhuggingface.co\u002Fpricing) — no subscription needed; it's billed by the second, and a small CPU job costs ~$0.01\u002Fhr. Run `hf jobs hardware` for current flavors and prices.)\n\n**Prefer your own machine?** A recipe is just a UV script, so on a box with the hardware it needs — most recipes here want a CUDA GPU — you can run it (or inspect it with `--help`) directly, no Jobs required:\n\n```bash\nuv run https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Focr\u002Fraw\u002Fmain\u002Fglm-ocr.py --help\n```\n\n## What's a UV script?\n\nA normal Python file with a metadata block at the top that lists its dependencies:\n\n```python\n# \u002F\u002F\u002F script\n# requires-python = \">=3.10\"\n# dependencies = [\"datasets\", \"transformers\", \"torch\"]\n# \u002F\u002F\u002F\n```\n\nNormally, running someone's Python script means cloning their repo, making a virtual environment, and `pip install`-ing a `requirements.txt` first — and if your versions don't match theirs, it can still break. Here the dependencies live inside the file, in that comment block, so `uv` (and `hf jobs uv run`) reads them, installs exactly those versions into a throwaway environment, and runs the file — straight from a URL, with nothing to set up. This is the standard [PEP 723](https:\u002F\u002Fpeps.python.org\u002Fpep-0723\u002F) inline-script-metadata format; see the [uv scripts guide](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fguides\u002Fscripts\u002F) to learn more.\n\n## Why UV scripts\n\nA self-contained, pinned script is easy to run and reuse, for a few reasons:\n\n- **Discrete & single-purpose** — one script, one job. That job can be a two-second transform or a multi-hour fine-tune; either way it's one self-contained unit you pick by reading a header instead of a whole codebase.\n- **Self-describing** — the [PEP 723](https:\u002F\u002Fpeps.python.org\u002Fpep-0723\u002F) dependency block, the docstring, and `--help` tell you what it needs and how to call it.\n- **Reproducible** — dependencies are pinned *in the file*, so there's no env drift and no \"works on my machine.\"\n- **Composable** — recipes hand off through the Hub (usually a dataset in, a dataset or model out), so you can chain them into a pipeline.\n- **Portable** — one self-contained file; run it with `uv run` where you have the hardware (most recipes need a GPU), or `hf jobs uv run` it on a managed GPU.\n\n**Built for agents, too.** Every recipe takes its arguments in the same `input output` order and runs from a URL, so an AI agent can pick a tool from its header and run it with no setup. On Jobs the agent runs in a sandbox: a throwaway disk, access limited to what the token's repo permissions allow, and a cost cap per job — not arbitrary code on your machine. (Hugging Face also ships an [`hf` CLI skill for agents](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fagents-cli) for driving Jobs from an editor.) This repo also ships a ready-to-use **[`uv-recipes` agent skill](skills\u002Fuv-recipes\u002F)** — point your agent at it to discover, run, and adapt recipes.\n\n## Recipes\n\n| Domain | What it does | On the Hub |\n|---|---|---|\n| **ocr** ⭐ | OCR \u002F document → text & structured data — GLM, PaddleOCR-VL, Nanonets, olmOCR, dots, … (30+ models) | [`uv-scripts\u002Focr`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Focr) |\n| **vision** | Zero-shot detection & segmentation over image datasets | [`sam3`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fsam3) · [`object-detection`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fobject-detection) · [`vlm-object-detection`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fvlm-object-detection) |\n| **audio** | Transcription & speech translation | [`transcription`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Ftranscription) |\n| **embeddings & atlas** | Embed a dataset; build an interactive map | [`build-atlas`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fbuild-atlas) |\n| **data processing** | Filter \u002F dedup \u002F stats over large datasets | [`dataset-stats`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fdataset-stats) · [`deduplication`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fdeduplication) · [`classification`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fclassification) |\n| **dataset creation** | Turn PDFs \u002F image URLs into Hub datasets | [`dataset-creation`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fdataset-creation) · [`iiif-tiles`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fiiif-tiles) |\n| **synthetic data** | Generate datasets with LLMs | [`synthetic-data`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fsynthetic-data) |\n| **inference** | Run any open LLM \u002F VLM over a dataset | [`vllm`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fvllm) · [`openai-oss`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fopenai-oss) · [`transformers-inference`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Ftransformers-inference) |\n| **entity extraction** | NER \u002F structured extraction over text | [`gliner`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Fgliner) |\n| ***…and more*** | *Training, evaluation, RAG indexing — migrating as they mature* | [`training`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Ftraining) · [`transformers-training`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Ftransformers-training) |\n\nMost recipes now live in this repo; the rest link to the [`uv-scripts`](https:\u002F\u002Fhuggingface.co\u002Fuv-scripts) Hugging Face org where they run today, and migrate here over time. (each folder mirrors to its Hub dataset repo.)\n\n**What fits here:** any self-contained UV script for data or ML work on the Hub. OCR and dataset work are the current focus, but inference, evaluation, RAG indexing, and **training** (fine-tuning with TRL \u002F `transformers`, producing a model) are all in scope. If it's one pinned script that reads from or writes to the Hub, it belongs.\n\n## Compose a pipeline\n\nBecause recipes hand off through the Hub, you can chain them — each step's output dataset is the next step's input. A document-collection pipeline, end to end:\n\n```\nPDFs \u002F scans          →   OCR to markdown      →   dedup + stats        →   embed + visualise\ndataset-creation          ocr\u002Fglm-ocr.py           deduplication            build-atlas\n```\n\nEach arrow is a Hub dataset; each box is one `hf jobs uv run` (or `uv run`), and every box runs today from its Hub URL, even before it's migrated into this repo. A pipeline can also end in a *trained model* instead of another dataset. You can write the chain as a shell script, or an agent can generate it — the scripts are the same.\n\n## Portable: run it locally or on Jobs\n\nA recipe is the same file wherever you run it — on a machine with the hardware it needs, or on [Hugging Face Jobs](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fjobs) for a managed GPU. Same file, same arguments:\n\n```bash\nSCRIPT=https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002Focr\u002Fraw\u002Fmain\u002Fglm-ocr.py\n\n# locally — needs the right hardware (a GPU for most recipes)\nuv run $SCRIPT davanstrien\u002Fufo-ColPali your-username\u002Fufo-ocr\n\n# on a managed GPU — pick hardware with --flavor; --secrets forwards your write token\nhf jobs uv run --flavor l4x1 --secrets HF_TOKEN $SCRIPT davanstrien\u002Fufo-ColPali your-username\u002Fufo-ocr\n```\n\nWhy reach for [Jobs](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fjobs):\n\n- **Pay by the second** — billed only while the job runs. Run `hf jobs hardware`, or see the [flavors](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fjobs#select-the-hardware) and [pricing](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fjobs).\n- **No infra** — `hf jobs uv run \u003Curl>` and you're done. See the [`hf jobs` CLI](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fguides\u002Fcli#hf-jobs).\n- **Hub-native** — read and write datasets, models, and [storage buckets](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fstorage-buckets) directly. Running from the `https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fuv-scripts\u002F…` URL also attributes usage to the recipe.\n\n## Model licenses\n\nThese scripts are orchestration code: they download third-party models from the Hugging Face Hub at runtime and run inference. **This repo does not redistribute any model weights.** Each model you run carries its own license (MIT, Apache-2.0, OpenRAIL-M, and some with non-commercial or other use-based terms); those terms govern your use of the *model*, not this repo's code. **You are responsible for checking each model's license** — on its Hugging Face model card — before using it, especially in production.\n\n## License\n\nThe code and documentation in this repository are licensed under the [Apache License 2.0](LICENSE). See [NOTICE](NOTICE) for attribution.\n\n---\n\n*Recipes mirror to the [`uv-scripts`](https:\u002F\u002Fhuggingface.co\u002Fuv-scripts) Hugging Face org via GitHub Actions. See [CONTRIBUTING.md](CONTRIBUTING.md) to add one.*\n",2,"2026-06-11 04:12:31","CREATED_QUERY"]