[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72149":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72149,"Paper2Code","going-doer\u002FPaper2Code","going-doer","Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning",null,"Python",4664,659,38,12,0,10,20,71,30,30.46,"Apache License 2.0",false,"master",true,[],"2026-06-12 02:02:59","# 📄 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning\n\n**Minju Seo, Jinheon Baek†, Seongyun Lee, and Sung Ju Hwang†** († denotes equal advising)  \nInternational Conference on Learning Representations (ICLR), 2026  \n📄 [Read the paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.17192)  \n\n![PaperCoder Overview](.\u002Fassets\u002Fpapercoder_overview.png)\n\n**PaperCoder** is the multi-agent LLM system introduced in **Paper2Code**, designed to transform a paper into a code repository.\nIt follows a three-stage pipeline: planning, analysis, and code generation, each handled by specialized agents.\nOur method outperforms strong baselines on both Paper2Code and PaperBench and produces faithful, high-quality implementations.\n\n---\n\n## 🗺️ Table of Contents\n\n- [⚡ Quick Start](#-quick-start)\n- [📚 Detailed Setup Instructions](#-detailed-setup-instructions)\n- [📦 Paper2Code Benchmark Datasets](#-paper2code-benchmark-datasets)\n- [📊 Model-based Evaluation of Repositories](#-model-based-evaluation-of-repositories-generated-by-papercoder)\n\n---\n\n## ⚡ Quick Start\n- Note: The following command runs example paper ([Attention Is All You Need](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762)).  \n- For more setup options, including LaTeX-based inputs and PDF-to-JSON conversion, see [📚 Detailed Setup Instructions](#-detailed-setup-instructions).\n\n### Using OpenAI API\n- 💵 Estimated cost for using o3-mini: $0.50–$0.70\n\n```bash\npip install openai\n\nexport OPENAI_API_KEY=\"\u003COPENAI_API_KEY>\"\n\ncd scripts\nbash run.sh\n```\n\n### Using Open Source Models with vLLM\n- If you encounter any issues installing vLLM, please refer to the [official vLLM repository](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm).\n- The default model is `deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct`.\n\n```bash\npip install vllm\n\ncd scripts\nbash run_llm.sh\n```\n\n### Output Folder Structure (Only Important Files)\n```bash\noutputs\n├── Transformer\n│   ├── analyzing_artifacts\n│   ├── coding_artifacts\n│   └── planning_artifacts\n└── Transformer_repo # Final output repository\n```\n---\n\n## 📚 Detailed Setup Instructions\n\n### 🛠️ Environment Setup\n\n- 💡 To use the `o3-mini` version, make sure you have the latest `openai` package installed.\n- We recommend using a Python virtual environment before installing dependencies.\n- 📦 Install only what you need:\n  - For OpenAI API, install `openai`.\n  - For open-source models, install `vllm`.\n  - If you encounter any issues installing vLLM, please refer to the [official vLLM repository](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm).\n\n\n```bash\npip install openai \npip install vllm \n```\n\n- Or, if you prefer, you can install all dependencies using `pip`:\n\n```bash\npip install -r requirements.txt\n```\n\n### 📄 (Option) Convert PDF to JSON\nThe following process describes how to convert a paper PDF into JSON format.  \nIf you have access to the LaTeX source and plan to use it with PaperCoder, you may skip this step and proceed to [🚀 Running PaperCoder](#-running-papercoder).  \nNote: In our experiments, we converted all paper PDFs to JSON format.\n\n1. Clone the `s2orc-doc2json` repository to convert your PDF file into a structured JSON format.  \n   (For detailed configuration, please refer to the [official repository](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fs2orc-doc2json).)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fallenai\u002Fs2orc-doc2json.git\n```\n\n2. Run the PDF processing service.\n\n```bash\ncd .\u002Fs2orc-doc2json\u002Fgrobid-0.7.3\n.\u002Fgradlew run\n```\n\n3. Convert your PDF into JSON format.\n\n```bash\nmkdir -p .\u002Fs2orc-doc2json\u002Foutput_dir\u002Fpaper_coder\npython .\u002Fs2orc-doc2json\u002Fdoc2json\u002Fgrobid2json\u002Fprocess_pdf.py \\\n    -i ${PDF_PATH} \\\n    -t .\u002Fs2orc-doc2json\u002Ftemp_dir\u002F \\\n    -o .\u002Fs2orc-doc2json\u002Foutput_dir\u002Fpaper_coder\n```\n\n### 🚀 Running PaperCoder\n- Note: The following command runs example paper ([Attention Is All You Need](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762)).  \n  If you want to run PaperCoder on your own paper, please modify the environment variables accordingly.\n\n#### Using OpenAI API\n- 💵 Estimated cost for using o3-mini: $0.50–$0.70\n\n\n```bash\n# Using the PDF-based JSON format of the paper\nexport OPENAI_API_KEY=\"\u003COPENAI_API_KEY>\"\n\ncd scripts\nbash run.sh\n```\n\n```bash\n# Using the LaTeX source of the paper\nexport OPENAI_API_KEY=\"\u003COPENAI_API_KEY>\"\n\ncd scripts\nbash run_latex.sh\n```\n\n\n#### Using Open Source Models with vLLM\n- The default model is `deepseek-ai\u002FDeepSeek-Coder-V2-Lite-Instruct`.\n\n```bash\n# Using the PDF-based JSON format of the paper\ncd scripts\nbash run_llm.sh\n```\n\n```bash\n# Using the LaTeX source of the paper\ncd scripts\nbash run_latex_llm.sh\n```\n\n---\n\n## 📦 Paper2Code Benchmark Datasets\n- Huggingface dataset: [paper2code](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fiaminju\u002Fpaper2code)\n  \n- You can find the description of the Paper2Code benchmark dataset in [data\u002Fpaper2code](https:\u002F\u002Fgithub.com\u002Fgoing-doer\u002FPaper2Code\u002Ftree\u002Fmain\u002Fdata\u002Fpaper2code). \n- For more details, refer to Section 4.1 \"Paper2Code Benchmark\" in the [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.17192).\n\n\n---\n\n## 📊 Model-based Evaluation of Repositories Generated by PaperCoder\n\n- We evaluate repository quality using a model-based approach, supporting both reference-based and reference-free settings.  \n  The model critiques key implementation components, assigns severity levels, and generates a 1–5 correctness score averaged over 8 samples using **o3-mini-high**.\n\n- For more details, please refer to Section 4.3.1 (*Paper2Code Benchmark*) of the paper.\n- **Note:** The following examples evaluate the sample repository (**Transformer_repo**).  \n  Please modify the relevant paths and arguments if you wish to evaluate a different repository.\n\n### 🛠️ Environment Setup\n```bash\npip install tiktoken\nexport OPENAI_API_KEY=\"\u003COPENAI_API_KEY>\"\n```\n\n\n### 📝 Reference-free Evaluation\n- `target_repo_dir` is the generated repository.\n\n```bash\ncd codes\u002F\npython eval.py \\\n    --paper_name Transformer \\\n    --pdf_json_path ..\u002Fexamples\u002FTransformer_cleaned.json \\\n    --data_dir ..\u002Fdata \\\n    --output_dir ..\u002Foutputs\u002FTransformer \\\n    --target_repo_dir ..\u002Foutputs\u002FTransformer_repo \\\n    --eval_result_dir ..\u002Fresults \\\n    --eval_type ref_free \\\n    --generated_n 8 \\\n    --papercoder\n```\n\n### 📝 Reference-based Evaluation\n- `target_repo_dir` is the generated repository.\n- `gold_repo_dir` should point to the official repository (e.g., author-released code).\n\n```bash\ncd codes\u002F\npython eval.py \\\n    --paper_name Transformer \\\n    --pdf_json_path ..\u002Fexamples\u002FTransformer_cleaned.json \\\n    --data_dir ..\u002Fdata \\\n    --output_dir ..\u002Foutputs\u002FTransformer \\\n    --target_repo_dir ..\u002Foutputs\u002FTransformer_repo \\\n    --gold_repo_dir ..\u002Fexamples\u002FTransformer_gold_repo \\\n    --eval_result_dir ..\u002Fresults \\\n    --eval_type ref_based \\\n    --generated_n 8 \\\n    --papercoder\n```\n\n\n### 📄 Example Output\n```bash\n========================================\n🌟 Evaluation Summary 🌟\n📄 Paper name: Transformer\n🧪 Evaluation type: ref_based\n📁 Target repo directory: ..\u002Foutputs\u002FTransformer_repo\n📊 Evaluation result:\n        📈 Score: 4.5000\n        ✅ Valid: 8\u002F8\n========================================\n🌟 Usage Summary 🌟\n[Evaluation] Transformer - ref_based\n🛠️ Model: o3-mini\n📥 Input tokens: 44318 (Cost: $0.04874980)\n📦 Cached input tokens: 0 (Cost: $0.00000000)\n📤 Output tokens: 26310 (Cost: $0.11576400)\n💵 Current total cost: $0.16451380\n🪙 Accumulated total cost so far: $0.16451380\n============================================\n```\n","Paper2Code 项目旨在从机器学习领域的科学论文自动生成代码。它采用了一个多代理大语言模型系统 PaperCoder，通过规划、分析和代码生成三个阶段将论文转换为代码库，每个阶段由专门的代理负责处理。该项目支持使用 OpenAI API 或开源模型（如 DeepSeek-Coder-V2-Lite-Instruct）进行代码生成，并提供了详细的设置指南和快速启动脚本。适合需要快速实现论文中算法或模型的研究人员和开发者使用，尤其在缺乏足够时间或资源手动编写代码的情况下。",2,"2026-06-11 03:40:35","high_star"]