[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72143":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":36,"discoverSource":37},72143,"align-anything","PKU-Alignment\u002Falign-anything","PKU-Alignment","Align Anything: Training All-modality Model with Feedback","",null,"Python",4658,505,266,29,0,2,5,9,6,30.11,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32],"chameleon","dpo","large-language-models","multimodal","rlhf","vision-language-model","2026-06-12 02:02:59","\u003C!-- markdownlint-disable first-line-h1 -->\r\n\u003C!-- markdownlint-disable html -->\r\n\r\n\u003Cdiv align=\"center\">\r\n  \u003Cimg src=\"assets\u002Flogo.jpg\" width=\"390\"\u002F>\r\n  \u003Cdiv>&nbsp;\u003C\u002Fdiv>\r\n  \u003Cdiv align=\"center\">\r\n    \u003Cb>\u003Cfont size=\"5\">project website\u003C\u002Ffont>\u003C\u002Fb>\r\n    \u003Csup>\r\n      \u003Ca href=\"https:\u002F\u002Fspace.bilibili.com\u002F3493095748405551?spm_id_from=333.337.search-card.all.click\">\r\n        \u003Ci>\u003Cfont size=\"4\">HOT\u003C\u002Ffont>\u003C\u002Fi>\r\n      \u003C\u002Fa>\r\n    \u003C\u002Fsup>\r\n    &nbsp;&nbsp;&nbsp;&nbsp;\r\n    \u003Cb>\u003Cfont size=\"5\">PKU-Alignment Team\u003C\u002Ffont>\u003C\u002Fb>\r\n    \u003Csup>\r\n      \u003Ca href=\"https:\u002F\u002Fspace.bilibili.com\u002F3493095748405551?spm_id_from=333.337.search-card.all.click\">\r\n        \u003Ci>\u003Cfont size=\"4\">welcome\u003C\u002Ffont>\u003C\u002Fi>\r\n      \u003C\u002Fa>\r\n    \u003C\u002Fsup>\r\n  \u003C\u002Fdiv>\r\n  \u003Cdiv>&nbsp;\u003C\u002Fdiv>\r\n\r\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Falign-anything?logo=pypi)](https:\u002F\u002Fpypi.org\u002Fproject\u002Falign-anything)\r\n[![License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FPKU-Alignment\u002Falign-anything?label=license)](#license)\r\n\r\n[📘Documentation](https:\u002F\u002Falign-anything.readthedocs.io\u002F) |\r\n[🛠️Quick Start](#quick-start) |\r\n[🚀Algorithms](#algorithms) |\r\n[👀Evaluation](.\u002Fprojects\u002Feval-anything) |\r\n[🤔Reporting Issues](#report-issues)\r\n\r\n\u003C\u002Fdiv>\r\n\r\n\u003Cdiv align=\"center\">\r\n\r\n[Our All-Modality Alignment Datasets](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FPKU-Alignment\u002Falign-anything)\r\n\r\n\u003C\u002Fdiv>\r\n\r\nAlign-Anything aims to align any modality large models (any-to-any models) with human intentions and values. \r\n\r\n- **Highly Modular Framework** allowing users to easily modify and customize the code for different tasks (see [framework design](https:\u002F\u002Falign-anything.readthedocs.io\u002F)).\r\n- **Various Modality Model Fine-Tuning** for diverse multi-modal (image\u002Fvideo\u002Faudio) models (see [scripts](.\u002Fscripts)).\r\n- **Different Alignment Methods.** Different alignment algorithms, including SFT, DPO, PPO, and others.\r\n- **Multi-Modal CLI.** Multi-modal CLI for image, audio, and video modalities (see [multi-modal CLI](#multi-modal-cli)).\r\n- **O1-like Training.** O1-like training based on [DollyTails](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FPKU-Alignment\u002FDollyTails-12K) (see [scripts\u002Fllama_sft_o1.sh](.\u002Fscripts)).\r\n- **Rule-based RL.** Rule-based RL encouraged by [Deepseek-R1](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-R1).\r\n\r\n**Note:** We provide a [quick start guide](https:\u002F\u002Falign-anything.readthedocs.io\u002F) for users to quickly get the code structure and development details.\r\n\r\n## 📣 News\r\n\r\n### Roadmap\r\n\r\nWe are actively working on the following features:\r\n\r\n- ⚡️ **More Models:** Integrating cutting-edge models like the Qwen3-VL series.\r\n\r\n- 🚀 **More Inference Engines:** Adding support for high-performance engines like SGLang.\r\n\r\n- 🤖 **Advanced VLA Algorithms:** Implementing more VLA algorithms, including Safe-VLA.\r\n\r\n- 🧠 **Agent RL:** Expanding capabilities to support Agent-based Reinforcement Learning.\r\n\r\n- 🛠️ **Enhanced RLHF Features:** Upgrading our RL training framework with features like asynchronous rollout, vLLM sleep mode, and checkpoint-engine.\r\n\r\nStay tuned for more updates!\r\n  \r\n- **[2025.11.11]** 🎉🎉🎉 We now support the alignment fine-tuning of Qwen3 and Qwen3-MoE models!\r\n\r\n- **[2025.11.11]** 🎉🎉🎉 We integrate the **InterMT** project (NeurIPS 2025 Spotlight) into the main repository, featuring the first multi-turn interleaved preference alignment dataset with human feedback and InterMT-Bench for evaluating multi-turn multimodal interaction capabilities. Check out [InterMT](.\u002Fprojects\u002FInterMT) for more details.\r\n\r\n- **[2025.11.11]** 🛠️🛠️🛠️ We integrate the **eval-anything** evaluation framework into the main repository as a dedicated project for large-scale evaluation of any-to-any models. Check out [eval-anything](.\u002Fprojects\u002Feval-anything) for more details.\r\n\r\n- **[2025.04.14]** 📜📜📜 We release the tutorial on SFT training for `text-image-to-text` models. Check out the [cookbook_en](.\u002Fcookbooks\u002Fen\u002Ftext_image_to_text_sft.ipynb) (for English) and [cookbook_zh](.\u002Fcookbooks\u002Fzh\u002Ftext_image_to_text_sft.ipynb) (for Chinese).\r\n\r\n- **[2025.04.07]** 🥳🥳🥳 Align-Anything now serves as the homework platform for the PKU course [Large Language Models Basics and Alignment](https:\u002F\u002Fpku-llm.ai\u002F), supporting on both Nvidia GPU and Huawei Ascend NPU. The corresponding tutorial will be released soon!\r\n\r\n> Align-Anything目前已成为北京大学本硕博课程《大模型基础与对齐》的课程作业平台，支持在Nvidia GPU和华为昇腾NPU上进行训练与评估。对应教程将持续发布！\r\n\r\n- **[2025.03.31]** ✅✅✅ We enhance the installation process for both Nvidia GPU and Huawei Ascend NPU. Please refer to the [Quick Start](#quick-start) for details.\r\n\r\n- **[2025.03.31]** 🚀🚀🚀 We support wrapping the `actor` model with [vLLM engine](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) for sequence generation in `text-to-text ppo` training. It greatly accelerates the ppo training process. Our results show that with vLLM engine, it only takes 22 minutes to finish ppo, while the baseline case needs ~150 minutes.\r\n\r\n    > 😊 Our implementation is encouraged by [OpenRLHF](https:\u002F\u002Fgithub.com\u002FOpenRLHF\u002FOpenRLHF), which is a great project for RLHF training.\r\n\r\n- **[2025.03.27]** 📜📜📜 We release the tutorial on DPO training for `text-to-text` models. Check out the [cookbook_en](.\u002Fcookbooks\u002Fen\u002Ftext_to_text_dpo.ipynb) (for English) and [cookbook_zh](.\u002Fcookbooks\u002Fzh\u002Ftext_to_text_dpo.ipynb) (for Chinese).\r\n\r\n- **[2025.03.15]** 📜📜📜 We release the tutorial for extending modality from `text-to-text` to `text-image-to-text` models. Check out the [cookbook_en](.\u002Fcookbooks\u002Fen\u002Fmodality_scaling.ipynb) (for English) and [cookbook_zh](.\u002Fcookbooks\u002Fzh\u002Fmodality_scaling.ipynb) (for Chinese).\r\n\r\n  > We will release other tutorials in the future. Stay tuned! 😊\r\n\r\n- **[2025.03.15]** We have supported seamless migration to Slurm clusters! Check out our example [here](#training-on-slurm) to get started.\r\n\r\n- **[2025.03.14]** 🛠️🛠️🛠️ We have supported Safe RLHF-V for `Text + Image -> Text` modality models.\r\n\r\n- **[2025.03.12]** 🛠️🛠️🛠️ We have supported resume training for DPO and SFT, see [here](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002Falign-anything\u002Fpull\u002F153).\r\n\r\n- **[2025.03.11]** 🎉🎉🎉 We support the installation of **Huawei Ascend** dependencies through pre-set Docker image.\r\n\r\n- **[2025.03.02]** 🎉🎉🎉 We have implemented alignment training for Vision-Language-Action Models in embodied intelligence, see [VLA Trainer](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002Falign-anything\u002Ftree\u002Fmain\u002Falign_anything\u002Ftrainers\u002Ftext_video_to_action), with more features coming soon!\r\n\r\n- **[2025.02.28]** 🤝🤝🤝 We supported the training and inference of align-anything on Huawei Ascend NPU.\r\n\r\n  > 近期 align-anything 团队正在和华为昇腾团队积极联合开发，基于 VLLMs-Ascend 上的全模态推理和对齐微调。\r\n\r\n\r\n\u003Cdetails>\u003Csummary>More News\u003C\u002Fsummary>\r\n\r\n- **[2025.02.28]** 🤗🤗🤗 We open-sourced [🤗Align-DS-V](https:\u002F\u002Fhuggingface.co\u002FPKU-Alignment\u002FAlign-DS-V), an experimental vision-language model based on [DeepSeek-R1-Distill-Llama-8B](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-R1), which enhances reasoning by incorporating additional modalities into the language model. The model has already surpassed **18,000+** downloads!\r\n- **[2025.02.28]** We supported the alignment fine-tuning of DeepSeek’s Unified Multimodal Understanding and Generation Models, as well as the SFT and DPO of the [**Janus-Series**](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FJanus). You can find the examples in the `.\u002Fscripts` and `.\u002Fprojects\u002Fjanus` directories.\r\n- **[2025.02.19]** We supported the alignment methods **GRPO** used in DeepSeek R1. See [GRPO Trainer](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002Falign-anything\u002Fblob\u002Fmain\u002Falign_anything\u002Ftrainers\u002Ftext_to_text\u002Fgrpo.py).\r\n- **[2025.01.21]** We supported the alignment fine-tuning of **MiniCPM-o** (audio & image), also included in [the official repository’s README recommendations](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FMiniCPM-o#with-align-anything-).\r\n- **[2025.01.17]** 🔥🔥🔥 We supported the fine-tuning of **O1-like reasoning in the text2text modality** (see [DollyTails](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FPKU-Alignment\u002FDollyTails-12K)), with multimodal and additional modalities coming soon!\r\n- **[2024.10.11]** We supported the alignment fine-tuning of the latest **Emu3** model.\r\n- **[2024.08.29]** 💡💡💡 We supported learning from language feedback (different from binary feedback). For more details, see [lang-feedback](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002Falign-anything\u002Ftree\u002Fmain\u002Fprojects\u002Flang_feedback).\r\n- **[2024.10.10]** We support SFT for `Any -> Any` modality models Emu3.\r\n- **[2024.09.24]** We support SFT, DPO, RM and PPO for `Text + Video -> Text` modality models.\r\n- **[2024.09.13]** We support SFT, DPO, RM and PPO for `Text + Audio -> Text` modality models.\r\n- **[2024.08.17]** We support DPO and PPO for `Text+Image -> Text+Image` modality models.\r\n- **[2024.08.15]** We support a new function in the evaluation module: the `models_pk` script in [here](.\u002Fscripts\u002Fmodels_pk.sh), which enables comparing the performance of two models across different benchmarks.\r\n- **[2024.08.06]** We restructure the framework to support any modality evaluation and the supported benchmark list is [here](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002Falign-anything\u002Ftree\u002Fmain\u002Falign_anything\u002Fevaluation\u002Fbenchmarks).\r\n- **[2024.08.06]** We support `Text+Image -> Text+Image` modality for the SFT trainer and Chameleon models.\r\n- **[2024.07.23]** We support `Text -> Image`, `Text -> Audio`, and `Text -> Video` modalities for the SFT trainer and DPO trainer.\r\n- **[2024.07.22]** We support the **Chameleon** model for the SFT trainer and DPO trainer!\r\n- **[2024.07.17]** We open-source the Align-Anything-Instruction-100K dataset for text modality. This dataset is available in both [English](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FPKU-Alignment\u002FAlign-Anything-Instruction-100K) and [Chinese](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FPKU-Alignment\u002FAlign-Anything-Instruction-100K-zh) versions, each sourced from different data sets and meticulously refined for quality by GPT-4.\r\n- **[2024.07.14]** We open-source the align-anything framework.\r\n\r\n\u003C\u002Fdetails>\r\n\r\n## Quick Start\r\n\r\n### Easy Installation\r\n\r\n```bash\r\n# clone the repository\r\ngit clone git@github.com:PKU-Alignment\u002Falign-anything.git\r\ncd align-anything\r\n\r\n# create virtual env\r\nconda create -n align-anything python==3.11\r\nconda activate align-anything\r\n```\r\n\r\n#### On Nvidia GPU\r\n\r\n- **`[Optional]`** We recommend installing [CUDA](https:\u002F\u002Fanaconda.org\u002Fnvidia\u002Fcuda) in the conda environment and set the environment variable.\r\n\r\n```bash\r\n# We tested on the H800 computing cluster, and this version of CUDA works well.\r\n# You can adjust this version according to the actual situation of the computing cluster.\r\n\r\nconda install nvidia\u002Flabel\u002Fcuda-12.2.0::cuda\r\nexport CUDA_HOME=$CONDA_PREFIX\r\n```\r\n\r\n> If your CUDA installed in a different location, such as `\u002Fusr\u002Flocal\u002Fcuda\u002Fbin\u002Fnvcc`, you can set the environment variables as follows:\r\n\r\n```bash\r\nexport CUDA_HOME=\"\u002Fusr\u002Flocal\u002Fcuda\"\r\n```\r\n\r\nFinally, install `align-anything` by:\r\n\r\n```bash\r\npip3 install -e .\r\n\r\npip3 install vllm==0.7.2 # to run ppo on vllm engine\r\n```\r\n\r\n#### On Huawei Ascend NPU\r\n\r\nYou can build on Huawei Ascend NPU by simply:\r\n\r\n```bash\r\npip3 install -e .[ascend]\r\n```\r\n\r\nThe current test environment for Ascend is:\r\n\r\n- Python 3.10.6\r\n- CANN 8.0.rc3\r\n- Architecture: aarch64\r\n- Hardware: 8x Ascend-SNT9B ARM (192 cores, 1536GB memory)\r\n\r\n\u003Cdetails>\r\n  \u003Csummary>[Optional] Install ascend dependencies using our docker image\u003C\u002Fsummary>\r\n\r\n1. **Current Ascend Machine Environment Configuration**\r\n   The current environment configuration for the Ascend Machine is as follows:\r\n\r\n   ```\r\n   - Python version: 3.10.6\r\n   - CANN version: 8.0.rc3\r\n   - Architecture: aarch64\r\n   - Hardware: 8x Ascend-SNT9B ARM (192 cores, 1536GB memory)\r\n   - Ascend Driver Version: 23.0.7\r\n   - AscendHAL Version: 7.35.19\r\n   - AICPU Version: 1.0\r\n   - TDT Version: 1.0\r\n   - Log Version: 1.0\r\n   - Profiler Version: 2.0\r\n   - DVPP Kernels Version: 1.1\r\n   - TSFW Version: 1.0\r\n   - Inner Version: V100R001C15SPC012B220\r\n   - Compatible Versions: V100R001C30, V100R001C13, V100R001C15\r\n   - Compatible Firmware Versions: [7.0.0, 7.1.99]\r\n   - Package Version: 23.0.7\r\n   ```\r\n\r\n2. **Create the Docker Container**\r\n   To get started with the pre-configured environment, you can use the `setup_docker.sh` script located in the `.\u002Fscripts` directory to pull the Docker image and create a container with all necessary environments set up:\r\n\r\n   ```\r\n   cd scripts\r\n   bash setup_docker.sh\r\n   ```\r\n\r\n   This will automatically pull the Docker image and create a Docker container where all the dependencies and configurations for running the framework are already set up.\r\n\r\n3. **Warning**\r\n   **Environment Compatibility**: The environment mentioned above is tested and verified to work. If you attempt to run the setup on other environments, you may encounter issues. In such cases, you will need to perform debugging and adjustments yourself to ensure compatibility with your specific environment.\r\n\r\n\u003C\u002Fdetails>\r\n\r\n\r\nIf you encounter any issues, please refer to the [FAQ](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002Falign-anything\u002Fdiscussions\u002F167) for solutions.\r\n\r\n\u003Cdetails>\r\n\u003Csummary>[Optional] Other Dependencies\u003C\u002Fsummary>\r\n\r\n- `pip install -e .[text-to-audio]`: Install the text-to-audio dependencies.\r\n- `pip install -e .[minicpmv]`: Install the minicpmv dependencies.\r\n- `pip install -e .[minicpmo]`: Install the minicpmo dependencies.\r\n\r\n\u003C\u002Fdetails>\r\n\r\n### Training\r\n\r\nWe provide some scripts for quick start, you can find them in the `.\u002Fscripts` directory. These scripts would automatically download the model and dataset, and run the training or evaluation.\r\n\r\nFor example, `scripts\u002Fllava\u002Fllava_dpo.sh` is the script for `Text + Image -> Text` modality, you can run it by:\r\n\r\n```bash\r\ncd scripts\r\nbash llava\u002Fllava_dpo.sh\r\n```\r\n\r\n**Note:** The scripts will automatically download the model and dataset from huggingface. If you are prohibited from the internet, please try to use the `HF Mirror`:\r\n\r\n```bash\r\nexport HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\r\n```\r\n\r\n#### Training on Slurm\r\n\r\n> We fully support seamless migration to Slurm. If you plan to run training on a Slurm-managed cluster, we invite you to use our example Slurm training script:\r\n>\r\n> ```bash\r\n> cd scripts\r\n> bash slurm\u002Fslurm_llava_dpo.sh\r\n> ```\r\n>\r\n> This script is pre-configured with suitable Slurm parameters. You only need to adjust the settings (such as the `job name`, `partition`, `account`, `path` and `resource allocations`) to match your cluster configuration.\r\n\r\n## Algorithms\r\n\r\nWe support basic alignment algorithms for different modalities, each of which may involve additional algorithms. For instance, in the text modality, we have also implemented SimPO, KTO, and others.\r\n\r\n| Modality                           | SFT | RM  | DPO | PPO |\r\n| ---------------------------------- | --- | --- | --- | --- |\r\n| `Text -> Text (t2t)`               | ✔️  | ✔️  | ✔️  | ✔️  |\r\n| `Text+Image -> Text (ti2t)`        | ✔️  | ✔️  | ✔️  | ✔️  |\r\n| `Text+Image -> Text+Image (ti2ti)` | ✔️  | ✔️  | ✔️  | ✔️  |\r\n| `Text+Audio -> Text (ta2t)`        | ✔️  | ✔️  | ✔️  | ✔️  |\r\n| `Text+Video -> Text (tv2t)`        | ✔️  | ✔️  | ✔️  | ✔️  |\r\n| `Text -> Image (t2i)`              | ✔️  | ⚒️  | ✔️  | ⚒️  |\r\n| `Text -> Video (t2v)`              | ✔️  | ⚒️  | ✔️  | ⚒️  |\r\n| `Text -> Audio (t2a)`              | ✔️  | ⚒️  | ✔️  | ⚒️  |\r\n| `Text+Video -> Action (tv2act)`    | ✔️  | ⚒️  | ⚒️  | ⚒️  |\r\n\r\n## New Feature: Align VLA\r\n\r\n|              | \u003Cdetails>\u003Csummary>prompt\u003C\u002Fsummary>navigate to a basketball\u003C\u002Fdetails>                                          | \u003Cdetails>\u003Csummary>prompt\u003C\u002Fsummary>find to a basketball\u003C\u002Fdetails>                                              | \u003Cdetails>\u003Csummary>prompt\u003C\u002Fsummary>locate a vase.\u003C\u002Fdetails>                                                    | \u003Cdetails>\u003Csummary>prompt\u003C\u002Fsummary>find a spray bottle and pick up that spray bottle\u003C\u002Fdetails>                 |\r\n| ------------ | ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |\r\n| Baseline     | \u003Cimg src=\"assets\u002Ftext_video_to_action\u002Funsafevideo1.gif\" alt=\"Image 8\" style=\"max-width: 100%; height: auto;\"> | \u003Cimg src=\"assets\u002Ftext_video_to_action\u002Funsafevideo2.gif\" alt=\"Image 8\" style=\"max-width: 100%; height: auto;\"> | \u003Cimg src=\"assets\u002Ftext_video_to_action\u002Funsafevideo3.gif\" alt=\"Image 8\" style=\"max-width: 100%; height: auto;\"> | \u003Cimg src=\"assets\u002Ftext_video_to_action\u002Funsafevideo4.gif\" alt=\"Image 8\" style=\"max-width: 100%; height: auto;\"> |\r\n| **AlignVLA** | \u003Cimg src=\"assets\u002Ftext_video_to_action\u002F\u002Fsafevideo1.gif\" alt=\"Image 8\" style=\"max-width: 100%; height: auto;\">  | \u003Cimg src=\"assets\u002Ftext_video_to_action\u002F\u002Fsafevideo2.gif\" alt=\"Image 8\" style=\"max-width: 100%; height: auto;\">  | \u003Cimg src=\"assets\u002Ftext_video_to_action\u002F\u002Fsafevideo3.gif\" alt=\"Image 8\" style=\"max-width: 100%; height: auto;\">  | \u003Cimg src=\"assets\u002Ftext_video_to_action\u002F\u002Fsafevideo4.gif\" alt=\"Image 8\" style=\"max-width: 100%; height: auto;\">  |\r\n\r\n> Alignment fine-tuning can significantly enhance the security performance of the VLA model.\r\n\r\n### Downloading the training data\r\n\r\n```bash\r\npython -m align_anything.utils.spoc_utils.download_training_data --save_dir \u002Fpath\u002Fto\u002Fdata  --types fifteen\r\n```\r\n\r\nThen decompress the compressed data package.\r\n\r\n### Training\r\n\r\n\r\nmodify ``HOME_PREFIX`` in ``align-anything\u002Fscripts\u002Fvla\u002Fspoc_sft.sh`` to your local data path.\r\n\r\n\r\n```bash\r\nbash scripts\u002Fvla\u002Fspoc_sft.sh\r\n```\r\n\r\n\r\n## Citation\r\n\r\nPlease cite the repo if you find the data or code in this repo useful 😊\r\n\r\n```bibtex\r\n@inproceedings{ji2024align,\r\n  title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback},\r\n  author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang},\r\n  year={2024},\r\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.15838}\r\n}\r\n```\r\n\r\n## Report Issues\r\n\r\nIf you have any questions in the process of using align-anything, don't hesitate to ask your questions on [the GitHub issue page](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002Falign-anything\u002Fissues\u002Fnew\u002Fchoose), we will reply to you in 2-3 working days.\r\n\r\n# License\r\n\r\nalign-anything is released under Apache License 2.0.\r\n\r\n","Align-Anything 是一个用于通过反馈训练多模态模型的项目。其核心功能包括高度模块化的框架，支持多种模态（如图像、视频、音频）模型的微调，并提供了多种对齐方法，例如SFT、DPO和PPO等。该项目还具备多模态命令行界面，支持O1-like训练以及基于规则的强化学习。这些特性使得Align-Anything非常适合需要将大规模语言模型与人类意图和价值观对齐的应用场景，尤其是在开发跨模态理解和生成能力时。此外，项目文档详尽且易于上手，适合研究者和开发者快速入门并进行定制化开发。","2026-06-11 03:40:32","high_star"]