[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-709":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},709,"gemma-tuner-multimodal","mattmireles\u002Fgemma-tuner-multimodal","mattmireles","Fine-tune Gemma 4 and 3n with audio, images and text on Apple Silicon, using PyTorch and Metal Performance Shaders.","",null,"Python",1460,104,10,3,0,4,51,61.16,"MIT License",false,"main",true,[],"2026-06-11 04:00:33","# Gemma Multimodal Fine-Tuner\n\n![Gemma macOS Tuner wizard: system check, then LoRA \u002F model \u002F dataset steps](README\u002Fassets\u002Fwizard-cli.png)\n\n**Fine-tune Gemma on text, images, *and* audio — on your Mac, on data that doesn't fit on your Mac.**\n\n- 🖼️ **Image + text LoRA** — captioning and VQA on local CSV.\n- 🎙️ **Audio + text LoRA** — Apple-Silicon-native, no CUDA required.\n- 📝 **Text-only LoRA** — instruction or completion on CSV.\n- ☁️ **Stream from GCS \u002F BigQuery** — train on terabytes without filling your SSD.\n- 🍎 **Runs on Apple Silicon** — MPS-native, no NVIDIA box required.\n\n**Source:** [github.com\u002Fmattmireles\u002Fgemma-tuner-multimodal](https:\u002F\u002Fgithub.com\u002Fmattmireles\u002Fgemma-tuner-multimodal) (public).\n\n---\n\n## Watch your model learn\n\n![Real-time training visualizer: loss curve, attention heatmap, gradient signal, memory, and token predictions — updating live as training runs on Apple Silicon](README\u002Fassets\u002Ftraining-visualizer.png)\n\nLoss curve. Attention heatmap. Gradient signal strength. Memory pressure. Token-by-token predictions — all updating in real time, in your browser, while the model trains on your Mac. No TensorBoard. No notebook. One flag in your config, one URL in your terminal.\n\n→ [Setup takes 30 seconds](#training-visualizer)\n\n---\n\n## LoRA for Gemma 4 & 3n — why not just use…?\n\n| | **This** | MLX-LM | Unsloth | axolotl |\n| --- | :-: | :-: | :-: | :-: |\n| Fine-tune Gemma (text-only CSV) | ✅ | ✅ | ✅ | ✅ |\n| Fine-tune Gemma **image + text** (caption \u002F VQA CSV) | ✅ | ⚠️ varies | ⚠️ varies | ⚠️ varies |\n| Fine-tune Gemma **audio + text** | ✅ | ❌ | ❌ | ⚠️ CUDA only |\n| Runs on Apple Silicon (MPS) | ✅ | ✅ | ❌ | ❌ |\n| **Stream training data from cloud** | ✅ | ❌ | ❌ | ⚠️ partial |\n| No NVIDIA GPU required | ✅ | ✅ | ❌ | ❌ |\n\nFine-tune Gemma on **text, images, or audio** without renting an H100 or copying a terabyte of data to your laptop. All three modalities run on Apple Silicon.\n\n**Text-only fine-tuning** (instruction or completion on CSV) is supported: set `modality = text` in your profile and use local CSV splits under `data\u002Fdatasets\u002F\u003Cname>\u002F`. See [Text-only fine-tuning](#text-only-fine-tuning) below.\n\n**Image + text fine-tuning** (captioning or VQA on local CSV) uses `modality = image`, `image_sub_mode`, and `image_token_budget`; see [Image fine-tuning](#image-fine-tuning) below. v1 is **local CSV only** (same constraint as text-only).\n\n**How it works:** Hugging Face Gemma checkpoints + PEFT LoRA, supervised fine-tuning in [`gemma_tuner\u002Fmodels\u002Fgemma\u002Ffinetune.py`](gemma_tuner\u002Fmodels\u002Fgemma\u002Ffinetune.py), exported as a merged HF \u002F SafeTensors tree by [`gemma_tuner\u002Fscripts\u002Fexport.py`](gemma_tuner\u002Fscripts\u002Fexport.py). For Core ML conversion and GGUF inference tooling, see [`README\u002Fguides\u002FREADME.md`](README\u002Fguides\u002FREADME.md) — this repo's *training* path is Gemma-only by design.\n\n**Deeper reading:** [`README\u002Fguides\u002FREADME.md`](README\u002Fguides\u002FREADME.md) · [`README\u002Fspecifications\u002FGemma3n.md`](README\u002Fspecifications\u002FGemma3n.md)\n\n---\n\n## What you can build with this\n\n- **Domain-specific ASR** — fine-tune on medical dictation, legal depositions, call-center recordings, or any field where off-the-shelf Whisper \u002F Gemma mishears the jargon.\n- **Domain-specific vision** — captioning or VQA on receipts, charts, screenshots, manufacturing defects, medical imagery — any visual domain where generic models hallucinate.\n- **Document & screen understanding** — train on screenshot → structured-output pairs for UI agents, OCR-adjacent pipelines, or chart QA.\n- **Accent, dialect, and low-resource language adaptation** — adapt a base Gemma model to underrepresented voices and languages with your own labeled audio.\n- **Multimodal assistants** — extend Gemma's text reasoning with image *or* audio grounding for transcription, captioning, and Q&A pipelines.\n- **Private, on-device pipelines** — train and run entirely on your Mac. Data never leaves the machine; weights never touch a third-party API.\n\nIf your data lives in GCS or BigQuery, you can do all of this on a laptop without copying terabytes locally — the dataloader streams shards on demand.\n\n---\n\n## Supported models\n\nTraining targets **Gemma multimodal (text + image + audio)** checkpoints loaded via `base_model` in [`config\u002Fconfig.ini`](config\u002Fconfig.ini) and routed to [`gemma_tuner\u002Fmodels\u002Fgemma\u002Ffinetune.py`](gemma_tuner\u002Fmodels\u002Fgemma\u002Ffinetune.py). The default file ships these **`[model:…]`** entries (LoRA on top of the Hub weights):\n\n| Model key (in `config\u002Fconfig.ini`) | Hugging Face `base_model` | Notes |\n| --- | --- | --- |\n| `gemma-4-e2b-it` | [`google\u002Fgemma-4-E2B-it`](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-4-E2B-it) | Gemma 4 instruct, ~2B — requires `requirements\u002Frequirements-gemma4.txt` (see Installation) |\n| `gemma-4-e4b-it` | [`google\u002Fgemma-4-E4B-it`](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-4-E4B-it) | Gemma 4 instruct, ~4B — requires Gemma 4 stack |\n| `gemma-4-e2b` | [`google\u002Fgemma-4-E2B`](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-4-E2B) | Gemma 4 base — requires Gemma 4 stack |\n| `gemma-4-e4b` | [`google\u002Fgemma-4-E4B`](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-4-E4B) | Gemma 4 base — requires Gemma 4 stack |\n| `gemma-3n-e2b-it` | [`google\u002Fgemma-3n-E2B-it`](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-3n-E2B-it) | Gemma 3n instruct, ~2B — **default** on the base `pip install -e .` pin |\n| `gemma-3n-e4b-it` | [`google\u002Fgemma-3n-E4B-it`](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-3n-E4B-it) | Gemma 3n instruct, ~4B |\n\nAdd your own **`[model:your-name]`** section with `group = gemma` and a compatible `base_model` if you need another **any-to-any** Gemma 3n \u002F Gemma 4 E2B–E4B checkpoint. **Larger Gemma 4 weights** on Hugging Face (for example 26B or 31B class) use a different Transformers architecture than this trainer’s `AutoModelForCausalLM` audio path—they are **not** supported here yet.\n\nWizard time and memory hints come from [`gemma_tuner\u002Fwizard\u002Fbase.py`](gemma_tuner\u002Fwizard\u002Fbase.py) (`ModelSpecs`).\n\n---\n\n## Architecture\n\n| Piece | Role |\n| --- | --- |\n| [`gemma_tuner\u002Fcli_typer.py`](gemma_tuner\u002Fcli_typer.py) | Canonical CLI (`gemma-macos-tuner`). Imports `core.bootstrap` early so MPS env vars are set before Torch is loaded. |\n| [`gemma_tuner\u002Fcore\u002Fops.py`](gemma_tuner\u002Fcore\u002Fops.py) | Dispatches prepare → `scripts.prepare_data`, finetune → `scripts.finetune`, evaluate → `scripts.evaluate`, export → `scripts.export`. |\n| [`gemma_tuner\u002Fscripts\u002Ffinetune.py`](gemma_tuner\u002Fscripts\u002Ffinetune.py) | **Router**: only models whose name contains `gemma` → [`gemma_tuner\u002Fmodels\u002Fgemma\u002Ffinetune.py`](gemma_tuner\u002Fmodels\u002Fgemma\u002Ffinetune.py). |\n| [`gemma_tuner\u002Futils\u002Fdevice.py`](gemma_tuner\u002Futils\u002Fdevice.py) | MPS → CUDA → CPU selection, sync helpers, memory hints. |\n| [`gemma_tuner\u002Futils\u002Fdataset_utils.py`](gemma_tuner\u002Futils\u002Fdataset_utils.py) | CSV loads, patches, blacklist\u002Fprotection semantics. |\n| [`gemma_tuner\u002Fwizard\u002F`](gemma_tuner\u002Fwizard\u002F) | Questionary + Rich UI; training is spawned with `python -m gemma_tuner.main finetune …` from the repo root (see [`gemma_tuner\u002Fwizard\u002Frunner.py`](gemma_tuner\u002Fwizard\u002Frunner.py)). |\n\n**Run layout** (typical):\n\n```text\noutput\u002F\n├── {id}-{profile}\u002F\n│   ├── metadata.json\n│   ├── metrics.json\n│   ├── checkpoint-*\u002F\n│   └── adapter_model\u002F          # LoRA artifacts when applicable\n```\n\n**Configuration:** hierarchical INI—defaults, groups, models, datasets, then profiles—read by `gemma_tuner\u002Fcore\u002Fconfig.py`. Set `GEMMA_TUNER_CONFIG` if you invoke the CLI outside the repo root.\n\n---\n\n## Requirements\n\n| | |\n| --- | --- |\n| **Python** | **3.10+** (matches `pyproject.toml`) |\n| **macOS** | 12.3+ for MPS; use **native arm64** Python, not Rosetta |\n| **RAM** | 16 GB minimum for the smaller Gemma runs; 32 GB+ recommended |\n| **CUDA** | Optional; install the CUDA build of PyTorch that matches your driver |\n\n---\n\n## Installation\n\n### 1. Create a Python 3.10+ virtual environment\n\nmacOS's built-in Python is 3.9, which is too old. Install a newer one with Homebrew:\n\n```bash\nbrew install python@3.12\n```\n\nThen create and activate a virtual environment:\n\n```bash\npython3.12 -m venv .venv\nsource .venv\u002Fbin\u002Factivate\n```\n\nEvery command below assumes the venv is active. To reactivate in a new terminal:\n`source .venv\u002Fbin\u002Factivate`.\n\n### 2. Confirm you are on arm64 (Apple Silicon)\n\n```bash\npython -c \"import platform; print(platform.machine())\"\n# arm64  -> good\n# x86_64 -> Python is running under Rosetta; install a native arm64 Python and recreate the venv\n```\n\nA native arm64 Python is available from [python.org](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Fmacos\u002F)\nor Homebrew (`brew install python@3.12`).\n\n### 3. Install PyTorch\n\n```bash\npip install torch torchaudio\n```\n\n### 4. Install this package\n\n```bash\npip install -e .\n```\n\n### 5. Authenticate with Hugging Face\n\nGemma weights are gated. Accept the license on the\n[model card](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-3n-E2B-it), then either log in or\nexport a token:\n\n```bash\nhuggingface-cli login\n# or:  export HF_TOKEN=hf_...\n```\n\n### 6. Gemma 4 (optional)\n\nThe base install (`pip install -e .`) pins Transformers ≥5.5 — both **Gemma 3n** and **Gemma 4** families work out of the box. Gemma 4 checkpoints need a slightly newer PEFT:\n\n```bash\npip install -r requirements\u002Frequirements-gemma4.txt\n```\n\n`finetune` and `export` are family-aware. A few non-training commands (`gemma_generate`, multimodal probing, ASR eval) still reject Gemma 4 ids until those code paths are upgraded.\n\n### 7. Run the wizard\n\n```bash\ngemma-macos-tuner wizard\n```\n\nThe wizard is the primary UI: it picks the model, walks you through dataset and\nhyperparameter selection, and starts training. On first run it creates\n`config\u002Fconfig.ini` for you from the committed\n[`config\u002Fconfig.ini.example`](config\u002Fconfig.ini.example) template (the live config\nis gitignored because the wizard writes local paths and GCP project IDs into it).\n\nIf a command fails, run `gemma-macos-tuner system-check` first to surface\nenvironment issues.\n\n---\n\n## Zero to training in 90 seconds\n\nThe repo ships a 16-row instruction-tuning dataset at [`data\u002Fdatasets\u002Fsample-text\u002F`](data\u002Fdatasets\u002Fsample-text\u002F) — translations, summaries, trivia, haiku, JSON conversion. Small enough to finish in under a minute. Large enough to prove the full pipeline works: data loading, tokenization, LoRA, checkpointing, export.\n\n```bash\ngemma-macos-tuner wizard\n```\n\nPick **Instruction tuning → gemma-3n-e2b-it → sample-text**, accept the defaults, and watch it train. First run downloads ~5 GB of base weights from Hugging Face (step 5 above must be done). Every run after that starts in seconds.\n\nOr skip the wizard entirely:\n\n```bash\ngemma-macos-tuner finetune sample-text\n```\n\nOnce the sample run finishes, drop your own CSV under `data\u002Fdatasets\u002F\u003Cyour-name>\u002F` and run the wizard again — it picks up new datasets automatically.\n\n---\n\n## What kind of data do I need?\n\nAll training data is **CSV** under `data\u002Fdatasets\u002F\u003Cname>\u002F`, with one row per\nexample and a header row. The required columns depend on the modality. Each\ndataset directory holds at least:\n\n```text\ndata\u002Fdatasets\u002F\u003Cname>\u002F\n├── train.csv\n└── validation.csv\n```\n\nThere is no JSONL \u002F Parquet \u002F Hugging Face dataset format requirement — just CSV.\nThe column **names** are configurable via `prompt_column`, `text_column`, and\n`image_path_column` in your profile; the names below are the defaults used by\n[`config\u002Fconfig.ini.example`](config\u002Fconfig.ini.example).\n\n### Text instruction (`modality = text`, `text_sub_mode = instruction`)\n\n```csv\nid,prompt,response\n1,Translate to French: Good morning.,Bonjour.\n2,What is the capital of Japan?,Tokyo.\n```\n\nThe prompt is masked from the loss; the model only learns to generate `response`.\nThis is what the bundled `sample-text` dataset uses.\n\n### Text completion (`modality = text`, `text_sub_mode = completion`)\n\n```csv\nid,text\n1,\"Once upon a time, in a small village by the sea, ...\"\n```\n\nA single text column; the full sequence is trained (no prompt mask). Useful for\ndomain pretraining-style adaptation.\n\n### Image + text (`modality = image`)\n\n```csv\nid,image_path,caption\n1,images\u002Freceipt_001.jpg,\"Total: $42.18, paid in cash\"\n2,images\u002Freceipt_002.jpg,\"Subtotal $19.99, tax $1.60, total $21.59\"\n```\n\n`image_path` is resolved relative to the dataset directory (or an absolute path).\nFor VQA, set `image_sub_mode = vqa` and use `image_path,question,answer` columns.\nSee [Image fine-tuning](#image-fine-tuning) for details.\n\n### Audio + text (`modality = audio`, the default)\n\n```csv\nid,audio_path,text,language,duration\n1,audio\u002Fsample_001.wav,\"the quick brown fox jumps over the lazy dog\",en,2.4\n```\n\n`audio_path` points at decoded WAV files (16 kHz mono recommended). The\n`gemma-macos-tuner prepare` command will fetch and decode audio for you if you\nprovide an `audio_url` column instead. See [`README\u002FDatasets.md`](README\u002FDatasets.md)\nfor the full schema and the GCS \u002F BigQuery streaming variants.\n\n---\n\n## CLI cheat sheet\n\n```bash\n# Dataset prep (profile names come from config\u002Fconfig.ini)\ngemma-macos-tuner prepare \u003Cdataset-profile>\n\n# Train (model in profile must be a Gemma id \u002F local path with \"gemma\" in the string)\ngemma-macos-tuner finetune \u003Cprofile> --json-logging\n\n# Evaluate\ngemma-macos-tuner evaluate \u003Cprofile-or-run>\n\n# Export merged HF\u002FSafeTensors tree (LoRA merged when adapter_config.json is present)\ngemma-macos-tuner export \u003Crun-dir-or-profile>\n\n# Exported models and completed runs include a .integrity.json manifest for\n# corruption\u002Fdrift detection. Verification is intentionally strict about\n# unexpected extra tracked files. This is integrity only, not signing\u002Fauthenticity.\n\n# Blacklist generation from errors\ngemma-macos-tuner blacklist \u003Cprofile>\n\n# Run index\ngemma-macos-tuner runs list\n\n# Guided setup\ngemma-macos-tuner wizard\n```\n\n**Migration from `main.py` \u002F old habits:** [`docs\u002FMIGRATION.md`](docs\u002FMIGRATION.md). Runs management moved to the `runs` subcommand—not a separate `manage.py` in this tree.\n\n---\n\n## Text-only fine-tuning\n\nTrain on **CSV text** (local splits under `data\u002Fdatasets\u002F\u003Cname>\u002F`) without audio. v1 supports **local CSV only** — not BigQuery or Granary streaming (those remain audio-oriented).\n\nSet in your `[profile:…]` (see also [`README\u002FDatasets.md`](README\u002FDatasets.md)):\n\n- `modality = text`\n- `text_sub_mode = instruction` — user\u002Fassistant turns: set `prompt_column` and `text_column` (response).\n- `text_sub_mode = completion` — one column; the full sequence is trained (no prompt mask).\n\nOptional: `max_seq_length` (default `2048`).\n\n**Instruction example** (profile snippet):\n\n```ini\nmodality = text\ntext_sub_mode = instruction\ntext_column = response\nprompt_column = prompt\nmax_seq_length = 2048\n```\n\n**Completion example**:\n\n```ini\nmodality = text\ntext_sub_mode = completion\ntext_column = text\nmax_seq_length = 2048\n```\n\nThe checkpoint is still a multimodal Gemma `AutoModelForCausalLM`; the USM audio tower weights remain in memory in v1 even when you only train on text. See [`README\u002FKNOWN_ISSUES.md`](README\u002FKNOWN_ISSUES.md).\n\n---\n\n## Image fine-tuning\n\nTrain on **image + text** pairs from **local CSV** splits under `data\u002Fdatasets\u002F\u003Cname>\u002F` (`train.csv` \u002F `validation.csv`). v1 supports **captioning** (`image_sub_mode = caption`) and **VQA** (`image_sub_mode = vqa`). See [`README\u002FDatasets.md`](README\u002FDatasets.md) for all keys.\n\n- **Caption \u002F OCR-style:** user turn = image + fixed instruction (“Describe this image.”); assistant = your caption column.\n- **VQA:** user turn = image + question (`prompt_column`); assistant = answer (`text_column`).\n\n**Profile snippet (caption):**\n\n```ini\nmodality = image\nimage_sub_mode = caption\ntext_column = caption\nimage_path_column = image_path\nimage_token_budget = 280\n```\n\n**Profile snippet (VQA):**\n\n```ini\nmodality = image\nimage_sub_mode = vqa\nprompt_column = question\ntext_column = answer\nimage_path_column = image_path\nimage_token_budget = 560\n```\n\n`image_token_budget` must be one of **70, 140, 280, 560, 1120**. Use the **same** value at inference as during training. Higher budgets improve detail but increase memory and step time on MPS. Export saves the processor next to weights; if `metadata.json` from the run is present, export reapplies the stored budget to the processor for consistency.\n\n---\n\n## Gemma 3n \u002F Gemma 4 on Apple Silicon\n\nEnd-to-end notes live in [`README\u002Fspecifications\u002FGemma3n.md`](README\u002Fspecifications\u002FGemma3n.md). Multimodal Gemma 4 + MPS field guide: [`README\u002Fguides\u002Fapple-silicon\u002Fgemma4-guide.md`](README\u002Fguides\u002Fapple-silicon\u002Fgemma4-guide.md). Common commands:\n\n```bash\npython -m gemma_tuner.scripts.gemma_preflight\npython -m gemma_tuner.scripts.gemma_profiler --model google\u002Fgemma-3n-E2B-it\n\ngemma-macos-tuner wizard\n\npython -m gemma_tuner.scripts.gemma_tiny_overfit --profile gemma-lora-test --max-samples 32\n\npython tools\u002Feval_gemma_asr.py \\\n  --csv data\u002Fdatasets\u002F\u003Cyour_dataset>\u002Fvalidation.csv \\\n  --model google\u002Fgemma-3n-E2B-it \\\n  --adapters output\u002F\u003Cyour_run>\u002F \\\n  --text-column text \\\n  --limit 200\n```\n\n**MPS notes:** prefer bf16 when supported; attention is forced to `eager` for stability; unset `PYTORCH_ENABLE_MPS_FALLBACK=1` after debugging — leaving it on hides silent CPU fallbacks.\n\n---\n\n## Data: CSVs, GCS, BigQuery\n\n- **Local \u002F HTTP \u002F GCS paths** in your prepared CSV; use `gemma-macos-tuner prepare \u003Cprofile> --no-download` to avoid copying GCS audio locally.\n- **BigQuery import** (wizard or scripts): needs `pip install .[gcp]` and Application Default Credentials (`gcloud auth application-default login` or `GOOGLE_APPLICATION_CREDENTIALS`). The wizard can materialize `_prepared.csv` and append a dataset section to `config\u002Fconfig.ini`.\n\nPatch layout (by dataset `source`):\n\n```text\ndata_patches\u002F{source}\u002F\n├── override_text_perfect\u002F\n├── do_not_blacklist\u002F\n└── delete\u002F\n```\n\n---\n\n## Training visualizer\n\nSix live panels in your browser while the model trains:\n\n| Panel | What it shows |\n| --- | --- |\n| **Loss curve** | Per-step loss over time — the single most important number in training |\n| **Attention heatmap** | Where the model is looking across the input, layer by layer |\n| **Signal strength** | Gradient norm — are the updates meaningful or vanishing? |\n| **Step size** | Learning rate at each step (schedule + warmup visible at a glance) |\n| **Memory** | GPU\u002FMPS memory in GB — catch pressure before it becomes a crash |\n| **Token predictions** | Top-5 next-token probabilities — watch the model's guesses sharpen in real time |\n\n**Setup:**\n\n```bash\npip install -e \".[viz]\"\n```\n\nThen set `visualize = true` in your profile and run training. The trainer prints a URL (default `127.0.0.1:8080`). Open it. That's it.\n\nIf Flask isn't installed, training still runs — the visualizer is skipped silently. No dependency, no breakage.\n\n---\n\n## NVIDIA Granary & streaming\n\nLarge-corpus workflows: `gemma-macos-tuner prepare-granary \u003Cprofile>` and streaming-oriented dataset keys—see [`README\u002FDatasets.md`](README\u002FDatasets.md).\n\n---\n\n## Apple Silicon knobs\n\n```bash\n# Debug only—surfaces unsupported ops by falling back to CPU (slow)\nexport PYTORCH_ENABLE_MPS_FALLBACK=1\n\n# Cap MPS allocator high-water mark (try 0.7–0.9)\nexport PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8\n```\n\nPreprocessing worker count and dataloader settings are controlled from `config\u002Fconfig.ini`; defaults favor using available CPU cores for `Dataset.map`.\n\n---\n\n## CI & tests\n\nWorkflows under [`.github\u002Fworkflows\u002F`](.github\u002Fworkflows\u002F): lint (`ruff`), fast tests (`pytest -k \"not slow\"`), macOS smoke. Regenerate lockfiles with `pip-compile` when you change `pyproject.toml`—see comments in [`requirements\u002Frequirements.txt`](requirements\u002Frequirements.txt).\n\n---\n\n## Troubleshooting\n\n| Symptom | Likely fix |\n| --- | --- |\n| `Unsupported model` from finetune | Use a Gemma model id \u002F path containing `gemma`. |\n| MPS not available | macOS 12.3+, arm64 Python, current PyTorch. |\n| OOM \u002F swap storm | Smaller batch, gradient checkpointing, lower `PYTORCH_MPS_HIGH_WATERMARK_RATIO`. |\n| Slow training with fallback env on | Unset `PYTORCH_ENABLE_MPS_FALLBACK` after debugging. |\n| Config not found | `GEMMA_TUNER_CONFIG`, or run from the repo with `config\u002Fconfig.ini`, or pass `--config`. |\n| 401 \u002F gated model \u002F cannot download weights | Accept the license on the model’s Hugging Face page; run `huggingface-cli login` or set `HF_TOKEN`. |\n\n---\n\n## Contributing\n\nSee [`docs\u002FCONTRIBUTING.md`](docs\u002FCONTRIBUTING.md). Prefer extending `cli_typer.py` and shared helpers in `gemma_tuner\u002Fcore\u002F` over one-off scripts.\n\n---\n\n## Acknowledgments\n\nGoogle's Gemma team, Hugging Face Transformers & PEFT, and the PyTorch MPS maintainers.\n\n---\n\n## License\n\nIf your data lives in a bucket and your GPU lives in your lap, this was built for you.\n\nReleased under the [MIT License](LICENSE).\n","Gemma Multimodal Fine-Tuner 是一个用于在 Apple Silicon 设备上对 Gemma 4 和 3n 模型进行多模态微调的工具，支持文本、图像和音频数据。其核心功能包括利用 PyTorch 和 Metal Performance Shaders 实现低秩适应（LoRA）技术，能够在没有 NVIDIA GPU 的情况下运行，并且可以直接从云存储（如 GCS 或 BigQuery）流式传输大量训练数据，避免了本地存储空间不足的问题。此项目特别适合需要在Mac上进行大规模多模态模型训练但又受限于硬件条件的研究者或开发者使用。",2,"2026-06-11 02:38:48","CREATED_QUERY"]