[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72533":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},72533,"Paper2Video","showlab\u002FPaper2Video","showlab","Automatic Video Generation from Scientific Papers","https:\u002F\u002Fshowlab.github.io\u002FPaper2Video\u002F",null,"Python",2310,329,14,4,0,1,11,54,3,72.46,"MIT License",false,"main",true,[],"2026-06-12 04:01:06","# Paper2Video\n\n\u003Cp align=\"right\">\n  \u003Cb>English\u003C\u002Fb> | \u003Ca href=\".\u002FREADME-CN.md\">简体中文\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Cb>Paper2Video: Automatic Video Generation from Scientific Papers\u003C\u002Fb>\n\u003Cbr>\n从学术论文自动生成演讲视频\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fzeyu-zhu.github.io\u002Fwebpage\u002F\">Zeyu Zhu*\u003C\u002Fa>,\n  \u003Ca href=\"https:\u002F\u002Fqhlin.me\u002F\">Kevin Qinghong Lin*\u003C\u002Fa>,\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=h1-3lSoAAAAJ&hl=en\">Mike Zheng Shou\u003C\u002Fa> \u003Cbr>\n  Show Lab, National University of Singapore\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.05096\">📄 Paper\u003C\u002Fa> &nbsp; | &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2510.05096\">🤗 Daily Paper\u003C\u002Fa> &nbsp; | &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FZaynZhu\u002FPaper2Video\">📊 Dataset\u003C\u002Fa> &nbsp; | &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fshowlab.github.io\u002FPaper2Video\u002F\">🌐 Project Website\u003C\u002Fa> &nbsp; | &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002FKevinQHLin\u002Fstatus\u002F1976105129146257542\">💬 X (Twitter)\u003C\u002Fa>\n\u003C\u002Fp>\n\n- **Input:** a paper ➕ an image ➕ an audio\n  \n| Paper | Image | Audio |\n|--------|--------|--------|\n| \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fshowlab\u002FPaper2Video\u002Fblob\u002Fpage\u002Fassets\u002Fhinton\u002Fpaper.png\" width=\"180\"\u002F>\u003Cbr>[🔗 Paper link](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1509.01626) | \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fshowlab\u002FPaper2Video\u002Fblob\u002Fpage\u002Fassets\u002Fhinton\u002Fhinton_head.jpeg\" width=\"180\"\u002F> \u003Cbr>Hinton's photo| \u003Cimg src=\"assets\u002Fsound.png\" width=\"180\"\u002F>\u003Cbr>[🔗 Audio sample](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FPaper2Video\u002Fblob\u002Fpage\u002Fassets\u002Fhinton\u002Fref_audio_10.wav) |\n\n\n- **Output:** a presentation video\n\n\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F39221a9a-48cb-4e20-9d1c-080a5d8379c4\n\n\n\n\nCheck out more examples at [🌐 project page](https:\u002F\u002Fshowlab.github.io\u002FPaper2Video\u002F).\n\n## 🔥 Update\n**Any contributions are welcome!**\n- [x] [2025.10.15] We update a new version without talking-head for fast generation!\n- [x] [2025.10.11] Our work receives attention on [YC Hacker News](https:\u002F\u002Fnews.ycombinator.com\u002Fitem?id=45553701).\n- [x] [2025.10.9] Thanks AK for sharing our work on [Twitter](https:\u002F\u002Fx.com\u002F_akhaliq\u002Fstatus\u002F1976099830004072849)!\n- [x] [2025.10.9] Our work is reported by [Medium](https:\u002F\u002Fmedium.com\u002F@dataism\u002Fhow-ai-learned-to-make-scientific-videos-from-slides-to-a-talking-head-0d807e491b27).\n- [x] [2025.10.8] Check out our demo video below!\n- [x] [2025.10.7] We release the [arxiv paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.05096).\n- [x] [2025.10.6] We release the [code](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FPaper2Video) and [dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FZaynZhu\u002FPaper2Video).\n- [x] [2025.9.28] Paper2Video has been accepted to the **Scaling Environments for Agents Workshop([SEA](https:\u002F\u002Fsea-workshop.github.io\u002F)) at NeurIPS 2025**.\n\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa655e3c7-9d76-4c48-b946-1068fdb6cdd9\n\n\n\n\n---\n\n### Table of Contents\n- [🌟 Overview](#-overview)\n- [🚀 Quick Start: PaperTalker](#-try-papertalker-for-your-paper-)\n  - [1. Requirements](#1-requirements)\n  - [2. Configure LLMs](#2-configure-llms)\n  - [3. Inference](#3-inference)\n- [📊 Evaluation: Paper2Video](#-evaluation-paper2video)\n- [😼 Fun: Paper2Video for Paper2Video](#-fun-paper2video-for-paper2video)\n- [🙏 Acknowledgements](#-acknowledgements)\n- [📌 Citation](#-citation)\n\n---\n\n## 🌟 Overview\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.png\" alt=\"Overview\" width=\"100%\">\n\u003C\u002Fp>\n\nThis work solves two core problems for academic presentations:\n\n- **Left: How to create a presentation video from a paper?**  \n  *PaperTalker* — an agent that integrates **slides**, **subtitling**, **cursor grounding**, **speech synthesis**, and **talking-head video rendering**.\n\n- **Right: How to evaluate a presentation video?**  \n  *Paper2Video* — a benchmark with well-designed metrics to evaluate presentation quality.\n\n\n---\n\n## 🚀 Try PaperTalker for your Paper!\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fmethod.png\" alt=\"Approach\" width=\"100%\">\n\u003C\u002Fp>\n\n### 1. Requirements\nPrepare the environment:\n```bash\ncd src\nconda create -n p2v python=3.10\nconda activate p2v\npip install -r requirements.txt\nconda install -c conda-forge tectonic\n```\n**[Optional] [Skip](#2-configure-llms) this part if you do not need a human presenter.**\n\nDownload the dependent code and follow the instructions in **[Hallo2](https:\u002F\u002Fgithub.com\u002Ffudan-generative-vision\u002Fhallo2)** to download the model weight.\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffudan-generative-vision\u002Fhallo2.git\n```\nYou need to **prepare the environment separately for talking-head generation** to potential avoide package conflicts, please refer to  \u003Ca href=\"git clone https:\u002F\u002Fgithub.com\u002Ffudan-generative-vision\u002Fhallo2.git\">Hallo2\u003C\u002Fa>. After installing, use `which python` to get the python environment path.\n```bash\ncd hallo2\nconda create -n hallo python=3.10\nconda activate hallo\npip install -r requirements.txt\n```\n\n### 2. Configure LLMs\nExport your **API credentials**:\n```bash\nexport GEMINI_API_KEY=\"your_gemini_key_here\"\nexport OPENAI_API_KEY=\"your_openai_key_here\"\n```\nThe best practice is to use **GPT4.1** or **Gemini2.5-Pro** for both LLM and VLMs. We also support locally deployed open-source model(e.g., Qwen), details please referring to \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FPaper2Poster\u002FPaper2Poster.git\">Paper2Poster\u003C\u002Fa>.\n\n### 3. Inference\nThe script `pipeline.py` provides an automated pipeline for generating academic presentation videos. It takes **LaTeX paper sources** together with **reference image\u002Faudio** as input, and goes through multiple sub-modules (Slides → Subtitles → Speech → Cursor → Talking Head) to produce a complete presentation video. ⚡ The minimum recommended GPU for running this pipeline is **NVIDIA A6000** with 48G.\n\n#### Example Usage\nRun the following command to launch a fast generation (**without talking-head generation**):\n```bash\npython pipeline_light.py \\\n    --model_name_t gpt-4.1 \\\n    --model_name_v gpt-4.1 \\\n    --result_dir \u002Fpath\u002Fto\u002Foutput \\\n    --paper_latex_root \u002Fpath\u002Fto\u002Flatex_proj \\\n    --ref_img \u002Fpath\u002Fto\u002Fref_img.png \\\n    --ref_audio \u002Fpath\u002Fto\u002Fref_audio.wav \\\n    --gpu_list [0,1,2,3,4,5,6,7]\n```\n\nRun the following command to launch a full generation (**with talking-head generation**):\n\n```bash\npython pipeline.py \\\n    --model_name_t gpt-4.1 \\\n    --model_name_v gpt-4.1 \\\n    --model_name_talking hallo2 \\\n    --result_dir \u002Fpath\u002Fto\u002Foutput \\\n    --paper_latex_root \u002Fpath\u002Fto\u002Flatex_proj \\\n    --ref_img \u002Fpath\u002Fto\u002Fref_img.png \\\n    --ref_audio \u002Fpath\u002Fto\u002Fref_audio.wav \\\n    --talking_head_env \u002Fpath\u002Fto\u002Fhallo2_env \\\n    --gpu_list [0,1,2,3,4,5,6,7]\n```\n\n| Argument | Type | Default | Description |\n|----------|------|---------|-------------|\n| `--model_name_t` | `str` | `gpt-4.1` | LLM |\n| `--model_name_v` | `str` | `gpt-4.1` | VLM |\n| `--model_name_talking` | `str` | `hallo2` | Talking Head model. Currently only **hallo2** is supported |\n| `--result_dir` | `str` | `\u002Fpath\u002Fto\u002Foutput` | Output directory (slides, subtitles, videos, etc.) |\n| `--paper_latex_root` | `str` | `\u002Fpath\u002Fto\u002Flatex_proj` | Root directory of the LaTeX paper project |\n| `--ref_img` | `str` | `\u002Fpath\u002Fto\u002Fref_img.png` | Reference image (must be **square** portrait) |\n| `--ref_audio` | `str` | `\u002Fpath\u002Fto\u002Fref_audio.wav` | Reference audio (recommended: ~10s) |\n| `--ref_text` | `str` | `None` | Optional reference text (for style guidance for subtitles) |\n| `--beamer_templete_prompt` | `str` | `None` | Optional reference text (for style guidance for slides) |\n| `--gpu_list` | `list[int]` | `\"\"` | GPU list for parallel execution (used in **cursor generation** and **Talking Head rendering**) |\n| `--if_tree_search` | `bool` | `True` | Whether to enable tree search for slide layout refinement |\n| `--stage` | `str` | `\"[0]\"` | Pipeline stages to run (e.g., `[0]` full pipeline, `[1,2,3]` partial stages) |\n| `--talking_head_env` | `str` | `\u002Fpath\u002Fto\u002Fhallo2_env` | python environment path for talking-head generation |\n---\n\n## 📊 Evaluation: Paper2Video\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fmetrics.png\" alt=\"Metrics\" width=\"100%\">\n\u003C\u002Fp>\n\nUnlike natural video generation, academic presentation videos serve a highly specialized role: they are not merely about visual fidelity but about **communicating scholarship**. This makes it difficult to directly apply conventional metrics from video synthesis(e.g., FVD, IS, or CLIP-based similarity). Instead, their value lies in how well they **disseminate research** and **amplify scholarly visibility**.From this perspective, we argue that a high-quality academic presentation video should be judged along two complementary dimensions:\n#### For the Audience\n- The video is expected to **faithfully convey the paper’s core ideas**.  \n- It should remain **accessible to diverse audiences**.  \n\n#### For the Author\n- The video should **foreground the authors’ intellectual contribution and identity**.  \n- It should **enhance the work’s visibility and impact**.  \n\nTo capture these goals, we introduce evaluation metrics specifically designed for academic presentation videos: Meta Similarity, PresentArena, PresentQuiz, IP Memory.\n\n### Run Eval\n- Prepare the environment:\n```bash\ncd src\u002Fevaluation\nconda create -n p2v_e python=3.10\nconda activate p2v_e\npip install -r requirements.txt\n```\n- For MetaSimilarity and PresentArena:\n```bash\npython MetaSim_audio.py --r \u002Fpath\u002Fto\u002Fresult_dir --g \u002Fpath\u002Fto\u002Fgt_dir --s \u002Fpath\u002Fto\u002Fsave_dir\npython MetaSim_content.py --r \u002Fpath\u002Fto\u002Fresult_dir --g \u002Fpath\u002Fto\u002Fgt_dir --s \u002Fpath\u002Fto\u002Fsave_dir\n```\n```bash\npython PresentArena.py --r \u002Fpath\u002Fto\u002Fresult_dir --g \u002Fpath\u002Fto\u002Fgt_dir --s \u002Fpath\u002Fto\u002Fsave_dir\n```\n- For **PresentQuiz**, first generate questions from paper and eval using Gemini:\n```bash\ncd PresentQuiz\npython create_paper_questions.py ----paper_folder \u002Fpath\u002Fto\u002Fdata\npython PresentQuiz.py --r \u002Fpath\u002Fto\u002Fresult_dir --g \u002Fpath\u002Fto\u002Fgt_dir --s \u002Fpath\u002Fto\u002Fsave_dir\n```\n\n- For **IP Memory**, first generate question pairs from generated videos and eval using Gemini:\n```bash\ncd IPMemory\npython construct.py\npython ip_qa.py\n```\nSee the codes for more details!\n\n👉 Paper2Video Benchmark is available at:\n[HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FZaynZhu\u002FPaper2Video)\n\n---\n\n## 😼 Fun: Paper2Video for Paper2Video\nCheck out **How Paper2Video for Paper2Video**:\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fff58f4d8-8376-4e12-b967-711118adf3c4\n\n## 🙏 Acknowledgements\n\n* The souces of the presentation videos are SlideLive and YouTuBe.\n* We thank all the authors who spend a great effort to create presentation videos!\n* We thank [CAMEL](https:\u002F\u002Fgithub.com\u002Fcamel-ai\u002Fcamel) for open-source well-organized multi-agent framework codebase.\n* We thank the authors of [Hallo2](https:\u002F\u002Fgithub.com\u002Ffudan-generative-vision\u002Fhallo2.git) and [Paper2Poster](https:\u002F\u002Fgithub.com\u002FPaper2Poster\u002FPaper2Poster.git) for their open-sourced codes.\n* We thank [Wei Jia](https:\u002F\u002Fgithub.com\u002Fweeadd) for his effort in collecting the data and implementing the baselines. We also thank all the participants involved in the human studies.\n* We thank all the **Show Lab @ NUS** members for support!\n\n\n\n---\n\n## 📌 Citation\n\n\nIf you find our work useful, please cite:\n\n```bibtex\n@misc{paper2video,\n      title={Paper2Video: Automatic Video Generation from Scientific Papers}, \n      author={Zeyu Zhu and Kevin Qinghong Lin and Mike Zheng Shou},\n      year={2025},\n      eprint={2510.05096},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.05096}, \n}\n```\n[![Star History](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=showlab\u002FPaper2Video&type=Date)](https:\u002F\u002Fstar-history.com\u002F#showlab\u002FPaper2Video&Date)\n","Paper2Video 是一个能够从学术论文自动生成演讲视频的工具。该项目通过输入一篇论文、一张图片和一段音频，自动合成一个包含讲解内容的视频。其核心功能包括文本到语音转换、图像处理以及视频生成，使用了先进的自然语言处理和计算机视觉技术。项目采用 Python 语言编写，并在 MIT 许可下开源。适用于科研人员、教育工作者或任何希望将书面内容以更生动的形式呈现给观众的场景。",2,"2026-06-11 03:42:26","high_star"]