[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72382":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":23,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":28,"discoverSource":29},72382,"goku","Saiyan-World\u002Fgoku","Saiyan-World","[CVPR2025 Highlight] Video Generation Foundation Models: https:\u002F\u002Fsaiyan-world.github.io\u002Fgoku\u002F","https:\u002F\u002Fsaiyan-world.github.io\u002Fgoku\u002F",null,"Python",2908,310,141,2,0,1,4,3,29.48,false,"main",true,[],"2026-06-12 02:03:02","# Goku: Flow Based Video Generative Foundation Models\n\n\u003Cdiv align=\"center\">\n  \n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv%20paper-2502.04896-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.04896)&nbsp;\n[![project page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_page-More_visualizations-green)](https:\u002F\u002Fsaiyan-world.github.io\u002Fgoku\u002F)&nbsp;\n  \n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1dc60a41-b8ff-4bfd-bfba-3a185ae63345\" width=\"100%\" controls autoplay loop>\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n\n> [**Goku: Flow Based Video Generative Foundation Models**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.04896)\u003Cbr>\n> [Shoufa Chen](https:\u002F\u002Fwww.shoufachen.com), [Chongjian Ge](https:\u002F\u002Fchongjiange.github.io\u002F), [Yuqi Zhang](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=7FlkVy8AAAAJ), [Yida Zhang](https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Yida_Zhang2), [Fengda Zhu](https:\u002F\u002Fwww.zhufengda.net\u002F), [Hao Yang](https:\u002F\u002Fgithub.com\u002Fhaoy945), [Hongxiang Hao](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=173GpBQAAAAJ&hl=zh-CN), [Hui Wu](https:\u002F\u002Fgithub.com\u002Fwhlook), [Zhichao Lai](https:\u002F\u002Fgithub.com\u002Flazychao), [Yifei Hu](https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Yifei_Hu3), [Ting-Che Lin](https:\u002F\u002Fgithub.com\u002Ftcl326), [Shilong Zhang](https:\u002F\u002Fjshilong.github.io\u002F), [Fu Li](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=2A7_3hoAAAAJ&hl=en), [Chuan Li](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fchuanli1101\u002F), [Xing Wang](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fxing-wang-49369620\u002F), [Yanghua Peng](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=Gf9amnoAAAAJ&hl=en), [Peize Sun](https:\u002F\u002Fpeizesun.github.io\u002F), [Ping Luo](http:\u002F\u002Fluoping.me\u002F), [Yi Jiang](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=6dikuoYAAAAJ&hl=en), [Zehuan Yuan](https:\u002F\u002Fshallowyuan.github.io\u002F), [Bingyue Peng](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fbingyp), [Xiaobing Liu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=1ypDmDwAAAAJ&hl=en)\n> \u003Cbr>HKU, ByteDance\u003Cbr>\n\n\n## Overview \nGoku is a new family of joint image-and-video generation models based on rectified flow Transformers. It is designed to achieve industry-grade performance, integrating advanced techniques for high-quality visual generation, including meticulous data curation, model design, and flow formulation.\n\nKey contributions include:\n- 📊 High-quality fine-grained image and video data curation.\n- 🔄 The pioneering use of rectified flow for enhanced interaction among video and image tokens.\n- 🌟 Superior qualitative and quantitative performance in both image and video generation tasks.\n\nGoku supports multiple generation tasks:\n- 🎬 **Text-to-Video Generation**\n- 🖼️ **Image-to-Video Generation**\n- 🎨 **Text-to-Image Generation**\n\n## Performance Benchmarks 🏅\nGoku achieves top scores on major benchmarks:\n- **0.76** on GenEval (text-to-image generation) \n- **83.65** on DPG-Bench (text-to-image generation) \n- **84.85** on VBench (text-to-video generation) \n\n\n\n\n\n### VBench Performance 🏆\nGoku-T2V achieves an impressive score of **84.85** in VBench, securing the No.2 position as of 2024-10-07, surpassing several leading commercial text-to-video models.\n\n| Method         | Total Score | Quality Score | Sampling Score | Style Consistency | Background Consistency | Temporal Flickering | Motion Smoothness | Dynamic Degree | Subject Quality | Imaging Quality | Object Class | Human Action | Object Relationship | Color | Scene | Prompt Style | Overall Consistency |\n|---------------|-------------|--------------|----------------|---------------------|---------------------|-----------------|----------------|-----------------|---------------|---------------|-------------|---------------|------------------|-------|------|-------------|----------------|\n| **AnimateDiff-V2** | 80.27 | 82.90 | 69.75 | 95.30 | 97.68 | 98.75 | 97.76 | 40.83 | 67.16 | 70.10 | 90.90 | 36.88 | 92.60 | 87.47 | 34.60 | 50.19 | 22.42 | 26.03 | 27.04 |\n| **VideoCrafter-2.0** | 80.44 | 82.20 | 73.42 | 96.85 | **98.22** | 98.41 | 97.73 | 42.50 | 63.13 | 67.22 | 92.55 | 40.66 | 95.00 | **92.92** | 35.86 | 55.29 | **25.13** | 25.84 | **28.23** |\n| **OpenSora V1.2** | 79.23 | 80.71 | 73.30 | 94.45 | 97.90 | 99.47 | 98.20 | 47.22 | 56.18 | 60.94 | 83.37 | 58.41 | 85.80 | 87.49 | 67.51 | 42.47 | 23.89 | 24.55 | 27.07 |\n| **Show-1** | 78.93 | 80.42 | 72.98 | 95.53 | 98.02 | 99.12 | 98.24 | 44.44 | 57.35 | 58.66 | 93.07 | 45.47 | 95.60 | 86.35 | 53.50 | 47.03 | 23.06 | 25.28 | 27.46 |\n| **Gen-3** | 82.32 | 84.11 | 75.17 | 97.10 | 96.62 | 98.61 | 99.23 | 60.14 | 63.34 | 66.82 | 87.81 | 53.64 | 96.40 | 80.90 | 65.09 | 54.57 | 24.31 | 24.71 | 26.69 |\n| **Pika-1.0** | 80.69 | 82.92 | 71.77 | 96.94 | 97.36 | **99.74** | **99.50** | 47.50 | 62.04 | 61.87 | 88.72 | 43.08 | 86.20 | 90.57 | 61.03 | 49.83 | 22.26 | 24.22 | 25.94 |\n| **CogVideoX-5B** | 81.61 | 82.75 | 77.04 | 96.23 | 96.52 | 98.66 | 96.92 | 70.97 | 61.98 | 62.90 | 85.23 | 62.11 | 99.40 | 82.81 | 66.35 | 53.20 | 24.91 | 25.38 | 27.59 |\n| **Kling** | 81.85 | 83.39 | 75.68 | **98.33** | 97.60 | 99.30 | 99.40 | 46.94 | 61.21 | 65.62 | 87.24 | 68.05 | 93.40 | 89.90 | 73.03 | 50.86 | 19.62 | 24.17 | 26.42 |\n| **Mira** | 71.87 | 78.78 | 44.21 | 96.23 | 96.92 | 98.29 | 97.54 | 60.33 | 42.51 | 60.16 | 52.06 | 12.52 | 63.80 | 42.24 | 27.83 | 16.34 | 21.89 | 18.77 | 18.72 |\n| **CausVid** | 84.27 | **85.65** | 78.75 | 97.53 | 97.19 | 96.24 | 98.05 | **92.69** | 64.15 | 68.88 | 92.99 | 72.15 | **99.80** | 80.17 | 64.65 | 56.58 | 24.27 | 25.33 | 27.51 |\n| **Luma** | 83.61 | 83.47 | **84.17** | 97.33 | 97.43 | 98.64 | 99.35 | 44.26 | 65.51 | 66.55 | **94.95** | **82.63** | 96.40 | 92.33 | 83.67 | **58.98** | 24.66 | **26.29** | 28.13 |\n| **HunyuanVideo** | 83.24 | 85.09 | 75.82 | 97.37 | 97.76 | 99.44 | 98.99 | 70.83 | 60.36 | 67.56 | 86.10 | 68.55 | 94.40 | 91.60 | 68.68 | 53.88 | 19.80 | 23.89 | 26.44 |\n| **Goku-T2V** (ours) | 84.85 | 85.60 | 81.87 | 95.55 | 96.67 | 97.71 | 98.50 | 76.11 | 67.22 | 71.29 | 94.40 | 79.48 | 97.60 | 83.81 | 85.72 | 57.08 | 23.08 | 25.64 | 27.35 |\n\n\n\n## BibTeX\n```bibtex\n@article{chen2025goku,\n  title={Goku: Flow Based Video Generative Foundation Models},\n  author={Chen, Shoufa and Ge, Chongjian and Zhang, Yuqi and Zhang, Yida and Zhu, Fengda and Yang, Hao and Hao, Hongxiang and Wu, Hui and Lai, Zhichao and Hu, Yifei and Lin, Ting-Che and Zhang, Shilong and Li, Fu and Li, Chuan and Wang, Xing and Peng, Yanghua and Sun, Peize and Luo, Ping and Jiang, Yi and Yuan, Zehuan and Peng, Bingyue and Liu, Xiaobing},\n  journal={arXiv preprint arXiv:2502.04896},\n  year={2025}\n}\n```\n","Goku 是一个基于修正流Transformer的图像和视频联合生成模型，旨在实现工业级的高性能视觉生成。该项目通过精心的数据整理、模型设计和流公式化技术，提供了高质量的图像和视频生成能力。其核心功能包括文本到视频、图像到视频以及文本到图像的生成任务，并在多个基准测试中表现出色，如在GenEval（文本到图像生成）上得分为0.76，在DPG-Bench（文本到图像生成）上为83.65，在VBench（文本到视频生成）上达到84.85。Goku适用于需要高质量视频和图像生成的应用场景，例如创意内容制作、虚拟现实体验及广告制作等。","2026-06-11 03:41:35","high_star"]