[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74264":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":47,"readmeContent":48,"aiSummary":49,"trendingCount":16,"starSnapshotCount":16,"syncStatus":50,"lastSyncTime":51,"discoverSource":52},74264,"mlx-tune","ARahim3\u002Fmlx-tune","ARahim3","Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.","https:\u002F\u002Farahim3.github.io\u002Fmlx-tune\u002F",null,"Python",1292,85,13,8,0,15,25,56,45,91.9,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46],"apple-silicon","deep-learning","huggingface","large-language-models","llm","llm-finetuning","local-llm","lora","machine-learning","macos","mlx","on-device-ai","peft","speech-recognition","speech-to-text","text-to-speech","transformers","unsloth","vision-language-model","whisper","2026-06-12 04:01:14","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FARahim3\u002Fmlx-tune\u002Fmain\u002Fmlx-tune-logo.png\" alt=\"MLX-Tune Logo\" width=\"300\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Fine-tune LLMs, Vision, Audio, and OCR models on your Mac\u003C\u002Fstrong>\u003Cbr>\n  \u003Cem>SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.\u003C\u002Fem>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FARahim3\u002Fmlx-tune\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Farahim3\u002Fmlx-tune?style=social\" alt=\"GitHub stars\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpepy.tech\u002Fprojects\u002Fmlx-tune\">\u003Cimg src=\"https:\u002F\u002Fstatic.pepy.tech\u002Fpersonalized-badge\u002Fmlx-tune?period=total&units=INTERNATIONAL_SYSTEM&left_color=GREY&right_color=GREEN&left_text=downloads\" alt=\"PyPI Downloads\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FARahim3\u002Fmlx-tune\">\u003Cimg alt=\"GitHub forks\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Farahim3\u002Fmlx-tune\">\u003C\u002Fa>\n  \u003Cbr>\n  \u003Ca href=\"#installation\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPlatform-Apple%20Silicon-black?logo=apple\" alt=\"Platform\">\u003C\u002Fa>\n  \u003Ca href=\"#requirements\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.9+-blue?logo=python&logoColor=white\" alt=\"Python\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMLX-0.20+-green\" alt=\"MLX\">\u003C\u002Fa>\n  \u003Ca href=\"#license\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-orange\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farahim3.github.io\u002Fmlx-tune\u002F\">Documentation\u003C\u002Fa> ·\n  \u003Ca href=\"#quick-start\">Quick Start\u003C\u002Fa> ·\n  \u003Ca href=\"#supported-training-methods\">Training Methods\u003C\u002Fa> ·\n  \u003Ca href=\"#examples\">Examples\u003C\u002Fa> ·\n  \u003Ca href=\"#project-status\">Status\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n> [!NOTE]\n> **Name Change**: This project was originally called `unsloth-mlx`. Since it's not an official Unsloth project and to avoid any confusion, it has been renamed to `mlx-tune`. The vision remains the same — bringing the Unsloth experience to Mac users via MLX. If you were using `unsloth-mlx`, simply switch to `pip install mlx-tune` and update your imports from `unsloth_mlx` to `mlx_tune`.\n\n> [!NOTE]\n> **Why I Built This (A Personal Note)**\n>\n> I rely on Unsloth for my daily fine-tuning on cloud GPUs—it's the gold standard for me. But recently, I started working on a MacBook M4 and hit a friction point: I wanted to prototype locally on my Mac, then scale up to the cloud without rewriting my entire training script.\n>\n> Since Unsloth relies on Triton (which Macs don't have, yet), I couldn't use it locally. I built `mlx-tune` to solve this specific \"Context Switch\" problem. It wraps Apple's native MLX framework in an Unsloth-compatible API.\n>\n> **The goal isn't to replace Unsloth or claim superior performance.** The goal is **code portability**: allowing you to write `FastLanguageModel` code once on your Mac, test it, and then push that *exact same script* to a CUDA cluster. It solves a workflow problem, not just a hardware one.\n>\n> This is an \"unofficial\" project built by a fan, for fans who happen to use Macs. It's helping me personally, and if it helps others like me, then I'll have my satisfaction.\n\n## Why MLX-Tune?\n\nBringing the [Unsloth](https:\u002F\u002Fgithub.com\u002Funslothai\u002Funsloth) experience to Mac users via Apple's [MLX](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx) framework.\n\n- 🚀 **Fine-tune LLMs, VLMs, TTS, STT & Embeddings** locally on your Mac (M1\u002FM2\u002FM3\u002FM4\u002FM5)\n- 💾 **Leverage unified memory** (up to 512GB on Mac Studio)\n- 🔄 **Unsloth-compatible API** - your existing training scripts just work!\n- 📦 **Export anywhere** - HuggingFace format, GGUF for Ollama\u002Fllama.cpp\n- 🎙️ **Audio fine-tuning** - 5 TTS models (Orpheus, OuteTTS, Spark, Sesame, Qwen3-TTS) + 7 STT models (Whisper, Moonshine, Qwen3-ASR, NVIDIA Canary, Voxtral, Voxtral Realtime, **NVIDIA Parakeet TDT**)\n\n```python\n# Unsloth (CUDA)                        # MLX-Tune (Apple Silicon)\nfrom unsloth import FastLanguageModel   from mlx_tune import FastLanguageModel\nfrom trl import SFTTrainer              from mlx_tune import SFTTrainer\n\n# Rest of your code stays exactly the same!\n```\n\n## What This Is (and Isn't)\n\n**This is NOT** a replacement for Unsloth or an attempt to compete with it. Unsloth is incredible - it's the gold standard for efficient LLM fine-tuning on CUDA.\n\n**This IS** a bridge for Mac users who want to:\n- 🧪 **Prototype locally** - Experiment with fine-tuning before committing to cloud GPU costs\n- 📚 **Learn & iterate** - Develop your training pipeline with fast local feedback loops\n- 🔄 **Then scale up** - Move to cloud NVIDIA GPUs + original Unsloth for production training\n\n```\nLocal Mac (MLX-Tune)       →     Cloud GPU (Unsloth)\n   Prototype & experiment          Full-scale training\n   Small datasets                  Large datasets\n   Quick iterations                Production runs\n```\n\n## Project Status\n\n> 🚀 **v0.4.25** — Arcee Trinity-Nano (AFMoE) fine-tuning: SFT, GRPO reasoning, and CPT recipes with per-expert LoRA over 128 experts + 1 shared expert. Also: CPT now correctly detects LoRA-wrapped quantized `lm_head` (benefits every 4-bit CPT, not just Trinity).\n\n| Feature | Status | Notes |\n|---------|--------|-------|\n| SFT Training | ✅ Stable | Native MLX training |\n| Model Loading | ✅ Stable | Any HuggingFace model (quantized & non-quantized) |\n| Save\u002FExport | ✅ Stable | HF format, GGUF ([see limitations](#known-limitations)) |\n| DPO Training | ✅ Stable | **Full DPO loss** |\n| ORPO Training | ✅ Stable | **Full ORPO loss** |\n| GRPO Training | ✅ Stable | **Multi-generation + reward** |\n| KTO Training | ✅ Stable | **Binary feedback + KTOConfig** |\n| SimPO Training | ✅ Stable | **No ref model + SimPOConfig** |\n| Chat Templates | ✅ Stable | 16 models (llama, gemma, qwen, phi, mistral) |\n| Response-Only Training | ✅ Stable | `train_on_responses_only()` |\n| Multi-turn Merging | ✅ Stable | `to_sharegpt()` + `conversation_extension` |\n| Column Mapping | ✅ Stable | `apply_column_mapping()` auto-rename |\n| Dataset Config | ✅ Stable | `HFDatasetConfig` structured loading |\n| Vision Models | ✅ Stable | Full VLM fine-tuning via mlx-vlm (**Gemma 4**, Qwen3.5, PaliGemma, LLaVA, Pixtral) |\n| **Gemma 4 Audio** | ✅ Stable | **E2B\u002FE4B STT\u002FASR via Conformer audio tower + optional audio LoRA** |\n| **MoE Fine-Tuning** | ✅ Stable | **Arcee Trinity-Nano (AFMoE), Gemma 4 26B-A4B, Qwen3.5-35B-A3B, Phi-3.5-MoE, Mixtral, DeepSeek, 39+ architectures** |\n| **TTS Fine-Tuning** | ✅ Stable | **Orpheus, OuteTTS, Spark-TTS, Sesame\u002FCSM, Qwen3-TTS** |\n| **STT Fine-Tuning** | ✅ Stable | **Whisper, Moonshine, Qwen3-ASR, Canary, Voxtral, Voxtral Realtime (streaming), Parakeet TDT (CTC\u002FRNN-T\u002FTDT losses + auto vocab extension)** |\n| **`convert()`** | ✅ Stable | **HF → MLX conversion (LLM, TTS, STT)** |\n| **Embedding Fine-Tuning** | ✅ Stable | **BERT, ModernBERT, Qwen3-Embedding, Harrier (InfoNCE\u002Fcontrastive)** |\n| **OCR Fine-Tuning** | ✅ Stable | **DeepSeek-OCR, GLM-OCR, olmOCR, Qwen-VL, Pixtral + CER\u002FWER metrics** |\n| **LFM2 Support** | ✅ Stable | **Liquid AI LFM2\u002FLFM2.5 (350M-24B, hybrid conv+GQA, Thinking)** |\n| **Continual Pretraining** | ✅ Stable | **CPTTrainer with decoupled LR, embed_tokens\u002Flm_head, full-weight mode** |\n| **`push_to_hub()`** | ✅ Stable | **Upload to HuggingFace Hub** |\n| PyPI Package | ✅ Available | `uv pip install mlx-tune` |\n\n## Installation\n\n```bash\n# Using uv (recommended - faster and more reliable)\nuv pip install mlx-tune\n\n# With audio support (TTS\u002FSTT fine-tuning)\nuv pip install 'mlx-tune[audio]'\nbrew install ffmpeg  # system dependency for audio codecs\n\n# Or using pip\npip install mlx-tune\n\n# From source (for development)\ngit clone https:\u002F\u002Fgithub.com\u002FARahim3\u002Fmlx-tune.git\ncd mlx-tune\nuv pip install -e .\n```\n\n## Quick Start\n\n```python\nfrom mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig\nfrom datasets import load_dataset\n\n# Load any HuggingFace model (1B model for quick start)\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n    model_name=\"mlx-community\u002FLlama-3.2-1B-Instruct-4bit\",\n    max_seq_length=2048,\n    load_in_4bit=True,\n)\n\n# Add LoRA adapters\nmodel = FastLanguageModel.get_peft_model(\n    model,\n    r=16,\n    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\"],\n    lora_alpha=16,\n)\n\n# Load a dataset (or create your own)\ndataset = load_dataset(\"yahma\u002Falpaca-cleaned\", split=\"train[:100]\")\n\n# Train with SFTTrainer (same API as TRL!)\ntrainer = SFTTrainer(\n    model=model,\n    train_dataset=dataset,\n    tokenizer=tokenizer,\n    args=SFTConfig(\n        output_dir=\"outputs\",\n        per_device_train_batch_size=2,\n        learning_rate=2e-4,\n        max_steps=50,\n    ),\n)\ntrainer.train()\n\n# Save (same API as Unsloth!)\nmodel.save_pretrained(\"lora_model\")  # Adapters only\nmodel.save_pretrained_merged(\"merged\", tokenizer)  # Full model\nmodel.save_pretrained_gguf(\"model\", tokenizer)  # GGUF (see note below)\n```\n\n> [!NOTE]\n> **GGUF Export**: Works with non-quantized base models. If using a 4-bit model (like above),\n> see [Known Limitations](#known-limitations) for workarounds.\n\n### Chat Templates & Response-Only Training\n\n```python\nfrom mlx_tune import get_chat_template, train_on_responses_only\n\n# Apply chat template (supports llama-3, gemma, qwen, phi, mistral, etc.)\ntokenizer = get_chat_template(tokenizer, chat_template=\"llama-3\")\n\n# Or auto-detect from model name\ntokenizer = get_chat_template(tokenizer, chat_template=\"auto\")\n\n# Train only on responses (not prompts) - more efficient!\ntrainer = train_on_responses_only(\n    trainer,\n    instruction_part=\"\u003C|start_header_id|>user\u003C|end_header_id|>\\n\\n\",\n    response_part=\"\u003C|start_header_id|>assistant\u003C|end_header_id|>\\n\\n\",\n)\n```\n\n### Vision Model Fine-Tuning (NEW!)\n\nFine-tune vision-language models like Gemma 4, Qwen3.5 on image+text tasks:\n\n```python\nfrom mlx_tune import FastVisionModel, UnslothVisionDataCollator, VLMSFTTrainer\nfrom mlx_tune.vlm import VLMSFTConfig\n\n# Load a vision model\nmodel, processor = FastVisionModel.from_pretrained(\n    \"mlx-community\u002FQwen3.5-0.8B-bf16\",\n)\n\n# Add LoRA (same params as Unsloth!)\nmodel = FastVisionModel.get_peft_model(\n    model,\n    finetune_vision_layers=True,\n    finetune_language_layers=True,\n    r=16, lora_alpha=16,\n)\n\n# Train on image-text data\nFastVisionModel.for_training(model)\ntrainer = VLMSFTTrainer(\n    model=model,\n    tokenizer=processor,\n    data_collator=UnslothVisionDataCollator(model, processor),\n    train_dataset=dataset,\n    args=VLMSFTConfig(max_steps=30, learning_rate=2e-4),\n)\ntrainer.train()\n```\n\nSee [`examples\u002F38_gemma4_vision_finetuning.py`](examples\u002F38_gemma4_vision_finetuning.py) for Gemma 4 vision fine-tuning, [`examples\u002F39_gemma4_text_to_sql.py`](examples\u002F39_gemma4_text_to_sql.py) for text-only fine-tuning through the VLM path, [`examples\u002F10_qwen35_vision_finetuning.py`](examples\u002F10_qwen35_vision_finetuning.py) for Qwen3.5, or [`examples\u002F26_vision_grpo_training.py`](examples\u002F26_vision_grpo_training.py) for Vision GRPO reasoning.\n\n### Gemma 4 Audio Fine-Tuning\n\nFine-tune Gemma 4 E2B\u002FE4B for speech-to-text and audio understanding. The 12-layer Conformer audio tower processes 16kHz audio — no separate STT model needed:\n\n```python\nfrom mlx_tune import FastVisionModel, UnslothVisionDataCollator, VLMSFTTrainer\nfrom mlx_tune.vlm import VLMSFTConfig\n\nmodel, processor = FastVisionModel.from_pretrained(\"mlx-community\u002Fgemma-4-e4b-it-4bit\")\nmodel = FastVisionModel.get_peft_model(model,\n    finetune_vision_layers=False, finetune_language_layers=True,\n    finetune_audio_layers=False,  # Set True for domain-specific acoustic adaptation\n    r=16, lora_alpha=16)\n\n# Dataset format: {\"type\": \"audio\", \"audio\": \"\u002Fpath\u002Fto\u002Ffile.wav\"}\ndataset = [{\"messages\": [\n    {\"role\": \"user\", \"content\": [\n        {\"type\": \"audio\", \"audio\": \"audio.wav\"},\n        {\"type\": \"text\", \"text\": \"Transcribe this audio.\"},\n    ]},\n    {\"role\": \"assistant\", \"content\": [{\"type\": \"text\", \"text\": \"Hello world.\"}]},\n]}]\n\n# Inference with audio\nresponse = model.generate(audio=\"audio.wav\", prompt=\"Transcribe this audio.\")\n```\n\nSee [`examples\u002F47_gemma4_audio_asr_finetuning.py`](examples\u002F47_gemma4_audio_asr_finetuning.py) for ASR fine-tuning or [`examples\u002F48_gemma4_audio_understanding.py`](examples\u002F48_gemma4_audio_understanding.py) for audio understanding with audio tower LoRA.\n\n### TTS Fine-Tuning\n\nFine-tune text-to-speech models on Apple Silicon. Supports Orpheus-3B, OuteTTS-1B, Spark-TTS (0.5B), Sesame\u002FCSM-1B, and Qwen3-TTS:\n\n```python\nfrom mlx_tune import FastTTSModel, TTSSFTTrainer, TTSSFTConfig, TTSDataCollator\nfrom datasets import load_dataset, Audio\n\n# Auto-detects model type, codec, and token format\nmodel, tokenizer = FastTTSModel.from_pretrained(\"mlx-community\u002Forpheus-3b-0.1-ft-bf16\")\n# Also works with:\n#   \"mlx-community\u002FLlama-OuteTTS-1.0-1B-8bit\"   (DAC codec, 24kHz)\n#   \"mlx-community\u002FSpark-TTS-0.5B-bf16\"          (BiCodec, 16kHz)\nmodel = FastTTSModel.get_peft_model(model, r=16, lora_alpha=16)\n\ndataset = load_dataset(\"MrDragonFox\u002FElise\", split=\"train[:100]\")\ndataset = dataset.cast_column(\"audio\", Audio(sampling_rate=24000))\n\ntrainer = TTSSFTTrainer(\n    model=model, tokenizer=tokenizer,\n    data_collator=TTSDataCollator(model, tokenizer),\n    train_dataset=dataset,\n    args=TTSSFTConfig(output_dir=\".\u002Ftts_output\", max_steps=60),\n)\ntrainer.train()\n```\n\nSee examples: [Orpheus](examples\u002F12_orpheus_tts_finetuning.py), [OuteTTS](examples\u002F14_outetts_finetuning.py), [Spark-TTS](examples\u002F15_spark_tts_finetuning.py), [Qwen3-TTS](examples\u002F20_qwen3_tts_finetuning.py).\n\n### STT Fine-Tuning\n\nFine-tune speech-to-text models. Supports Whisper (all sizes), Distil-Whisper, and Moonshine:\n\n```python\nfrom mlx_tune import FastSTTModel, STTSFTTrainer, STTSFTConfig, STTDataCollator\n\n# Auto-detects model type and preprocessor\nmodel, processor = FastSTTModel.from_pretrained(\"mlx-community\u002Fwhisper-tiny-asr-fp16\")\n# Also works with:\n#   \"mlx-community\u002Fdistil-whisper-large-v3\"   (Whisper architecture)\n#   \"UsefulSensors\u002Fmoonshine-tiny\"             (raw conv frontend)\nmodel = FastSTTModel.get_peft_model(model, r=8, finetune_encoder=True, finetune_decoder=True)\n\ntrainer = STTSFTTrainer(\n    model=model, processor=processor,\n    data_collator=STTDataCollator(model, processor, language=\"en\", task=\"transcribe\"),\n    train_dataset=dataset,\n    args=STTSFTConfig(output_dir=\".\u002Fstt_output\", max_steps=60),\n)\ntrainer.train()\n```\n\nSee examples: [Whisper](examples\u002F13_whisper_stt_finetuning.py), [Moonshine](examples\u002F16_moonshine_stt_finetuning.py), [Qwen3-ASR](examples\u002F17_qwen3_asr_finetuning.py), [Canary](examples\u002F18_canary_stt_finetuning.py), [Voxtral](examples\u002F19_voxtral_stt_finetuning.py), [Voxtral Realtime (streaming)](examples\u002F49_voxtral_realtime_stt_finetuning.py), [Parakeet TDT English](examples\u002F50_parakeet_english_finetuning.py), [Parakeet Welsh (new language)](examples\u002F51_parakeet_welsh_finetuning.py), [Parakeet Bengali (auto vocab extension)](examples\u002F52_parakeet_bengali_char_extension.py), [Parakeet Arabic (BPE extension)](examples\u002F53_parakeet_arabic_bpe_fulltune.py).\n\n### Embedding Fine-Tuning\n\nFine-tune sentence embedding models for semantic search using contrastive learning (InfoNCE loss). Supports BERT, ModernBERT, Qwen3-Embedding, Harrier, and more:\n\n```python\nfrom mlx_tune import FastEmbeddingModel, EmbeddingSFTTrainer, EmbeddingSFTConfig, EmbeddingDataCollator\n\n# Load embedding model (BERT or Qwen3-Embedding)\nmodel, tokenizer = FastEmbeddingModel.from_pretrained(\n    \"mlx-community\u002Fall-MiniLM-L6-v2-bf16\",  # or Qwen3-Embedding-0.6B-4bit-DWQ\n    pooling_strategy=\"mean\",                  # \"mean\", \"cls\", or \"last_token\"\n)\nmodel = FastEmbeddingModel.get_peft_model(model, r=16, lora_alpha=16)\n\n# Train with anchor-positive pairs (in-batch negatives via InfoNCE)\ntrainer = EmbeddingSFTTrainer(\n    model=model, tokenizer=tokenizer,\n    data_collator=EmbeddingDataCollator(model, tokenizer),\n    train_dataset=[{\"anchor\": \"query text\", \"positive\": \"relevant passage\"}, ...],\n    args=EmbeddingSFTConfig(\n        loss_type=\"infonce\", temperature=0.05,\n        per_device_train_batch_size=32, max_steps=50,\n    ),\n)\ntrainer.train()\n\n# Encode & compare\nembeddings = model.encode([\"Hello world\", \"Hi there\"])\nsimilarity = (embeddings[0] * embeddings[1]).sum().item()\n```\n\nSee examples: [BERT](examples\u002F27_embedding_finetuning.py), [Qwen3-Embedding](examples\u002F28_qwen3_embedding_finetuning.py), [Harrier-0.6B](examples\u002F31_harrier_0.6b_embedding_finetuning.py), [Harrier-270M](examples\u002F32_harrier_270m_embedding_finetuning.py).\n\n### OCR Fine-Tuning\n\nFine-tune dedicated OCR models or general VLMs for document understanding, handwriting recognition, LaTeX OCR, multilingual receipts, and more. Built-in CER\u002FWER evaluation metrics:\n\n```python\nfrom mlx_tune import FastOCRModel, OCRSFTTrainer, OCRSFTConfig, compute_ocr_metrics\n\n# Load a dedicated OCR model (or any VLM like Qwen3.5)\nmodel, processor = FastOCRModel.from_pretrained(\n    \"mlx-community\u002FDeepSeek-OCR-8bit\",  # 0.9B dedicated OCR model\n)\nmodel = FastOCRModel.get_peft_model(model, r=16, lora_alpha=16)\n# Vision layers frozen by default (OCR models have pre-optimized encoders)\n\n# Train on OCR data\ntrainer = OCRSFTTrainer(\n    model=model, processor=processor,\n    train_dataset=ocr_dataset,\n    args=OCRSFTConfig(max_steps=100, learning_rate=5e-5),\n)\ntrainer.train()\n\n# Transcribe & evaluate\ntext = model.transcribe(image)\nmetrics = model.evaluate(test_images, ground_truths)  # → {cer, wer, exact_match}\n```\n\n**Supported OCR models**: DeepSeek-OCR, DeepSeek-OCR-2, GLM-OCR, DOTS-OCR, olmOCR-2, LightOnOCR, Qwen2.5-VL, Qwen3.5, Pixtral, and any VLM supported by mlx-vlm.\n\nSee examples: [Document OCR](examples\u002F33_ocr_document_finetuning.py), [VLM→OCR](examples\u002F34_qwen_vlm_ocr_finetuning.py), [Handwriting](examples\u002F35_handwriting_ocr_finetuning.py), [OCR GRPO](examples\u002F36_ocr_grpo_training.py), [Multilingual](examples\u002F37_multilingual_ocr_finetuning.py).\n\n### Continual Pretraining (CPT)\n\nAdapt any model to new domains or languages by training on raw text. Supports LoRA CPT (with optional embedding training) and full-weight CPT:\n\n```python\nfrom mlx_tune import FastLanguageModel, CPTTrainer, CPTConfig\n\n# Load a BASE model (not instruction-tuned)\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n    \"mlx-community\u002FSmolLM2-360M-Instruct\", max_seq_length=2048,\n)\nmodel = FastLanguageModel.get_peft_model(model, r=16, target_modules=[\n    \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\",\n])\n\n# CPT: raw text, loss on ALL tokens, decoupled embedding LR\ntrainer = CPTTrainer(\n    model=model, tokenizer=tokenizer,\n    train_dataset=[{\"text\": \"Domain-specific document...\"}, ...],\n    args=CPTConfig(\n        learning_rate=5e-5,\n        embedding_learning_rate=5e-6,  # 10x smaller for embeddings\n        include_embeddings=True,       # auto-adds embed_tokens + lm_head\n        max_steps=1000,\n    ),\n)\ntrainer.train()\n```\n\nSee examples: [Language Adaptation](examples\u002F43_cpt_language_adaptation.py), [Domain Knowledge](examples\u002F44_cpt_domain_knowledge.py), [Code Capabilities](examples\u002F45_cpt_code_capabilities.py), [LFM2 + CPT](examples\u002F46_lfm2_cpt_domain.py).\n\n### LFM2 (Liquid AI) Fine-Tuning\n\nFine-tune Liquid Foundation Models with their hybrid gated-conv + GQA architecture:\n\n```python\nfrom mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig\n\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n    \"mlx-community\u002FLFM2-350M-4bit\", max_seq_length=2048,\n)\n# LFM2-specific target modules (auto-resolved)\nmodel = FastLanguageModel.get_peft_model(model, r=16, target_modules=[\n    \"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\",  # Attention\n    \"in_proj\", \"w1\", \"w2\", \"w3\",                # Gated conv MLP\n])\n```\n\n**Supported**: LFM2 (350M-2.6B dense), LFM2.5 (350M-1.2B), LFM2.5-Thinking, LFM2 MoE (8B-A1B, 24B-A2B).\n\nSee examples: [LFM2 SFT](examples\u002F41_lfm2_sft_finetuning.py), [LFM2.5-Thinking](examples\u002F42_lfm2_thinking_finetuning.py).\n\n### MoE Fine-Tuning\n\nFine-tune Mixture of Experts models — 39+ architectures supported automatically. MLX-Tune detects MoE layers and applies per-expert LoRA via `LoRASwitchLinear`:\n\n```python\nfrom mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig\n\n# Load any MoE model — same API as dense models!\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n    model_name=\"mlx-community\u002FQwen3.5-35B-A3B-4bit\",  # 35B total, 3B active\n    max_seq_length=2048,\n    load_in_4bit=True,\n)\n\n# Same target_modules — MoE paths resolved automatically\nmodel = FastLanguageModel.get_peft_model(\n    model, r=8,\n    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n                    \"gate_proj\", \"up_proj\", \"down_proj\"],\n)\n# Prints: \"MoE architecture detected — LoRA will target expert layers (SwitchLinear)\"\n```\n\n**Supported MoE models**: Arcee Trinity-Nano (AFMoE, 6B\u002F1B active, 128 experts + 1 shared), Qwen3.5-35B-A3B, Qwen3-30B-A3B, Phi-3.5-MoE, Mixtral, DeepSeek-V2\u002FV3, GLM-MoE, and all other MoE architectures in mlx-lm.\n\nSee examples: [Qwen3.5 MoE](examples\u002F29_qwen35_moe_finetuning.py), [Phi-3.5 MoE](examples\u002F30_phi35_moe_finetuning.py), [Trinity-Nano SFT](examples\u002F54_trinity_moe_sft.py), [Trinity-Nano GRPO](examples\u002F55_trinity_moe_grpo.py), [Trinity-Nano CPT](examples\u002F56_trinity_moe_cpt.py).\n\n### Post-Training Workflow\n\nAll model types (LLM, VLM, TTS, STT) support the full post-training workflow:\n\n```python\n# Save LoRA adapters\nmodel.save_pretrained(\".\u002Fadapters\")\n\n# Merge LoRA into base model\nmodel.save_pretrained_merged(\".\u002Fmerged\")\n\n# Convert HF model to MLX format\nFastLanguageModel.convert(\"model-name\", mlx_path=\".\u002Fmlx_model\")\n\n# Push to HuggingFace Hub\nmodel.push_to_hub(\"username\u002Fmy-model\")\n```\n\n## Supported Training Methods\n\n| Method | Trainer | Implementation | Use Case |\n|--------|---------|----------------|----------|\n| **SFT** | `SFTTrainer` | ✅ Native MLX | Instruction fine-tuning |\n| **DPO** | `DPOTrainer` | ✅ Native MLX | Preference learning (proper log-prob loss) |\n| **ORPO** | `ORPOTrainer` | ✅ Native MLX | Combined SFT + odds ratio preference |\n| **GRPO** | `GRPOTrainer` | ✅ Native MLX | Reasoning with multi-generation (DeepSeek R1 style) |\n| **KTO** | `KTOTrainer` | ✅ Native MLX | Kahneman-Tversky optimization |\n| **SimPO** | `SimPOTrainer` | ✅ Native MLX | Simple preference optimization |\n| **VLM SFT** | `VLMSFTTrainer` | ✅ Native MLX | Vision-Language model fine-tuning |\n| **Vision GRPO** | `VLMGRPOTrainer` | ✅ Native MLX | Vision-Language GRPO reasoning |\n| **TTS SFT** | `TTSSFTTrainer` | ✅ Native MLX | Orpheus, OuteTTS, Spark-TTS, Sesame\u002FCSM |\n| **STT SFT** | `STTSFTTrainer` | ✅ Native MLX | Whisper, Moonshine, Qwen3-ASR, Canary, Voxtral, Voxtral Realtime, Parakeet TDT (CTC\u002FRNN-T\u002FTDT, auto vocab extension) |\n| **Embedding** | `EmbeddingSFTTrainer` | ✅ Native MLX | BERT, ModernBERT, Qwen3-Embedding, Harrier (InfoNCE) |\n| **OCR SFT** | `OCRSFTTrainer` | ✅ Native MLX | DeepSeek-OCR, GLM-OCR, Qwen-VL, Pixtral (CER\u002FWER eval) |\n| **OCR GRPO** | `OCRGRPOTrainer` | ✅ Native MLX | OCR with character-level RL rewards |\n| **MoE** | `SFTTrainer` | ✅ Native MLX | Arcee Trinity-Nano (AFMoE), Qwen3.5-MoE, Phi-3.5-MoE, Mixtral, DeepSeek (39+ archs) |\n| **CPT** | `CPTTrainer` | ✅ Native MLX | Continual pretraining with decoupled LR, embed training |\n| **LFM2** | `SFTTrainer` | ✅ Native MLX | Liquid AI LFM2\u002FLFM2.5 (hybrid conv+GQA, Thinking) |\n\n## Examples\n\nCheck [`examples\u002F`](examples\u002F) for working code:\n- Basic model loading and inference (01–07)\n- Complete SFT fine-tuning pipeline (08)\n- RL training overview (09)\n- Vision model fine-tuning — Qwen3.5 (10-11)\n- **RL E2E training** — DPO (21), GRPO (22), ORPO (23), KTO (24), SimPO (25), Vision GRPO (26)\n- TTS fine-tuning — Orpheus-3B (12), OuteTTS (14), Spark-TTS (15), Qwen3-TTS (20)\n- STT fine-tuning — Whisper (13), Moonshine (16), Qwen3-ASR (17), Canary (18), Voxtral (19), Voxtral Realtime streaming (49), Parakeet TDT English (50), Parakeet Welsh new-language (51), Parakeet Bengali auto vocab extension (52), Parakeet Arabic BPE extension (53)\n- Embedding fine-tuning — BERT\u002FMiniLM (27), Qwen3-Embedding (28), Harrier-0.6B (31), Harrier-270M (32)\n- **OCR fine-tuning** — Document OCR (33), VLM→OCR (34), Handwriting (35), OCR GRPO (36), Multilingual (37)\n- **MoE fine-tuning** — Qwen3.5-35B-A3B (29), Phi-3.5-MoE (30), **Arcee Trinity-Nano AFMoE: SFT (54), GRPO reasoning (55), CPT (56)**\n- **LFM2 fine-tuning** — LFM2 SFT (41), LFM2.5-Thinking (42)\n- **Continual Pretraining** — Language (43), Domain (44), Code (45), LFM2+CPT (46)\n\n## Requirements\n\n- **Hardware**: Apple Silicon Mac (M1\u002FM2\u002FM3\u002FM4\u002FM5)\n- **OS**: macOS 13.0+\n- **Memory**: 8GB+ unified RAM (16GB+ recommended)\n- **Python**: 3.9+\n\n## Comparison with Unsloth\n\n| Feature | Unsloth (CUDA) | MLX-Tune |\n|---------|----------------|----------|\n| Platform | NVIDIA GPUs | Apple Silicon |\n| Backend | Triton Kernels | MLX Framework |\n| Memory | VRAM (limited) | Unified (up to 512GB) |\n| API | Original | 100% Compatible |\n| Best For | Production training | Local dev, large models |\n\n## Known Limitations\n\n### GGUF Export from Quantized Models\n\n**The Issue**: GGUF export (`save_pretrained_gguf`) doesn't work directly with quantized (4-bit) base models. This is a [known limitation in mlx-lm](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx-lm\u002Fissues\u002F353), not an mlx-tune bug.\n\n**What Works**:\n- ✅ Training with quantized models (QLoRA) - works perfectly\n- ✅ Saving adapters (`save_pretrained`) - works\n- ✅ Saving merged model (`save_pretrained_merged`) - works\n- ✅ Inference with trained model - works\n- ❌ GGUF export from quantized base model - mlx-lm limitation\n\n**Workarounds**:\n\n1. **Use a non-quantized base model** (recommended for GGUF export):\n   ```python\n   # Use fp16 model instead of 4-bit\n   model, tokenizer = FastLanguageModel.from_pretrained(\n       model_name=\"mlx-community\u002FLlama-3.2-1B-Instruct\",  # NOT -4bit\n       max_seq_length=2048,\n       load_in_4bit=False,  # Train in fp16\n   )\n   # Train normally, then export\n   model.save_pretrained_gguf(\"model\", tokenizer)  # Works!\n   ```\n\n2. **Dequantize during export** (results in large fp16 file):\n   ```python\n   model.save_pretrained_gguf(\"model\", tokenizer, dequantize=True)\n   # Then re-quantize with llama.cpp:\n   # .\u002Fllama-quantize model.gguf model-q4_k_m.gguf Q4_K_M\n   ```\n\n3. **Skip GGUF, use MLX format**: If you only need the model for MLX\u002FPython inference, just use `save_pretrained_merged()` - no GGUF needed.\n\n**Related Issues**:\n- [mlx-lm #353](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx-lm\u002Fissues\u002F353) - MLX to GGUF conversion\n- [mlx-examples #1382](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx-examples\u002Fissues\u002F1382) - Quantized to GGUF\n\n### DeepSeek-OCR requires transformers\u003C5.0\n\n**The Issue**: DeepSeek-OCR's model repo (`mlx-community\u002FDeepSeek-OCR-*`) ships remote code that imports `LlamaFlashAttention2` from `transformers.models.llama.modeling_llama`. That symbol was **removed in transformers 5.0**. Recent mlx-tune installs pull `mlx-lm>=0.31`, which requires `transformers>=5.0`, so a fresh `pip install mlx-tune` cannot load DeepSeek-OCR out of the box.\n\n**Additional missing deps**: DeepSeek-OCR's remote code also imports `addict`, `einops`, and `matplotlib` — none of these are declared by mlx-tune, mlx-vlm, or the model repo. You need to install them manually.\n\n**Working environment** (verified):\n```bash\nuv pip install 'transformers>=4.45,\u003C5.0' 'mlx-lm\u003C0.31' 'mlx-vlm\u003C0.4' addict einops matplotlib\nuv pip install mlx-tune --no-deps     # skip dep upgrade\n```\n\n**Symptom if you hit it**:\n- mlx-vlm raises `Unrecognized processing class` from `AutoProcessor.from_pretrained` (the real ImportError is swallowed by mlx-vlm's patch wrapper)\n- Debug by calling `DeepseekOCRProcessor.from_pretrained(model_path, trust_remote_code=True)` directly to see the underlying error\n\nDeepSeek-OCR-2 (`mlx-community\u002FDeepSeek-OCR-2-*`) needs mlx-vlm>=0.4 which needs transformers>=5.0 → currently not loadable anywhere. Tracking this upstream.\n\n## Contributing\n\nContributions welcome! Areas that need help:\n- Custom MLX kernels for even faster training\n- More test coverage (especially E2E and edge cases)\n- Testing on different M-series chips (M1, M2, M3, M4, M5)\n- Batched audio training (currently batch_size=1)\n- Batched RL training (currently single-sample)\n\n## License\n\nApache 2.0 - See [LICENSE](LICENSE) file.\n\n## Acknowledgments\n\n- [Unsloth](https:\u002F\u002Fgithub.com\u002Funslothai\u002Funsloth) - The original, incredible CUDA library\n- [MLX](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx) - Apple's ML framework\n- [MLX-LM](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx-lm) - LLM utilities for MLX\n- [MLX-VLM](https:\u002F\u002Fgithub.com\u002FBlaizzy\u002Fmlx-vlm) - Vision model support\n- [MLX-Audio](https:\u002F\u002Fgithub.com\u002FBlaizzy\u002Fmlx-audio) - Audio inference (TTS\u002FSTT) for MLX\n- [MLX-Embeddings](https:\u002F\u002Fgithub.com\u002FBlaizzy\u002Fmlx-embeddings) - Embedding models for MLX\n\n---\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Community project, not affiliated with Unsloth AI or Apple.\u003C\u002Fstrong>\u003Cbr>\n  ⭐ Star this repo if you find it useful!\n\u003C\u002Fp>\n","ARahim3\u002Fmlx-tune 是一个专为搭载 Apple Silicon 的 Mac 设备设计的本地大语言模型（LLM）微调工具。它支持多种微调方法，包括SFT、DPO、GRPO等，并且可以处理视觉、文本转语音（TTS）、语音识别（STT）、嵌入以及光学字符识别（OCR）任务，所有这些功能都基于Apple的MLX框架实现。此外，该项目提供了一个与Unsloth兼容的API接口，使得开发者能够在Mac上编写代码后无缝迁移到云端GPU进行大规模训练。mlx-tune适用于希望在个人电脑上快速原型化AI应用的研究者和开发人员，特别是在需要频繁切换开发环境的情况下。",2,"2026-06-11 03:49:44","high_star"]