[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81012":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":13,"stars30d":11,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":14,"rankGlobal":8,"rankLanguage":8,"license":15,"archived":16,"fork":16,"defaultBranch":17,"hasWiki":18,"hasPages":16,"topics":19,"createdAt":8,"pushedAt":8,"updatedAt":20,"readmeContent":21,"aiSummary":22,"trendingCount":13,"starSnapshotCount":13,"syncStatus":23,"lastSyncTime":24,"discoverSource":25},81012,"ELF-pytorch","Ugness\u002FELF-pytorch","Ugness",null,"Python",30,1,29,0,38,"MIT License",false,"main",true,[],"2026-06-11 04:07:22","# ELF: Embedded Language Flows (Unofficial PyTorch Reproduction)\n\n> [!CAUTION]\n>\n> The OpenWebText results are not directly comparable with baselines ([MDLM](https:\u002F\u002Fgithub.com\u002Fkuleshov-group\u002Fmdlm), [Duo](https:\u002F\u002Fgithub.com\u002Fs-sahoo\u002Fduo), [FLM](https:\u002F\u002Fgithub.com\u002Fdavid3684\u002Fflm), ...)\n> due to tokenization and preprocessing differences used in the ELF paper.\n>\n> Specifically, ELF uses a custom preprocessed OpenWebText dataset (see [`openwebtext-t5`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fembedded-language-flows\u002Fopenwebtext-t5)).\n> This is tokenized with the T5 tokenizer, not the GPT-2 tokenizer which is used in the standard setting in the literature. In addition, the paper's preprocessing pipeline includes a custom packing scheme with full details not disclosed in the paper.\n\n---\n\n **This is an unofficial PyTorch reproduction (OpenWebText Only)** of *ELF: Embedded Language Flows*.\\\n  The official JAX\u002FTPU implementation is at \u003Chttps:\u002F\u002Fgithub.com\u002Flillian039\u002FELF>, and the official checkpoints are in HuggingFace at [`embedded-language-flows`](https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows).\n\n This repository was developed using [Claude Code](https:\u002F\u002Fclaude.com\u002Fclaude-code).\n\n## Reproduction status\n\nOpenWebText (unconditional), ELF-B (105M), 32-step SDE, γ=1.5, SC-CFG=3:\n\n| Metric | Paper (TPU v5p-64) | Reproduction (8× B200 DDP, Lightning) |\n| --- | --- | --- |\n| Gen. PPL ↓ | 24.1 | **25.61** |\n| Entropy | 5.15 | **5.20** |\n\nPer-epoch results (32-step SDE, 256 samples):\n\n| Epoch | Step | Gen. PPL | Entropy |\n| --- | --- | --- | --- |\n| 1 | 38 034  | 2.73  | 0.70 |\n| 2 | 76 068  | 37.11  | 5.17 |\n| 3 | 114 102 | 28.63  | 5.21 |\n| 4 | 152 136 | 25.00  | 5.16 |\n| 5 | 190 170 | 25.58  | 5.19 |\n| 6 | 228 204 | 26.11  | 5.21 |\n\nAll samples used for the measurements can be found in\n[`reproduction\u002Felf_b-owt\u002Feval1000\u002Fmetrics.jsonl`](reproduction\u002Felf_b-owt\u002Feval1000\u002Fmetrics.jsonl)\nand [`reproduction\u002Felf_b-owt\u002Fper_epoch\u002Fmetrics.jsonl`](reproduction\u002Felf_b-owt\u002Fper_epoch\u002Fmetrics.jsonl).\n\n## TODO\n- [ ] Train ELF and\u002For some of the baselines ([MDLM](https:\u002F\u002Fgithub.com\u002Fkuleshov-group\u002Fmdlm), [Duo](https:\u002F\u002Fgithub.com\u002Fs-sahoo\u002Fduo), [FLM](https:\u002F\u002Fgithub.com\u002Fdavid3684\u002Fflm), ...) in a directly comparable setting (https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FSkylion007\u002Fopenwebtext).\n\n## What's in this repo\n\n- [`pytorch_lightning\u002F`](pytorch_lightning\u002F): model, training\n  script (`train_lightning.py`), eval (`eval_lightning.py`), and\n  utilities. 8-GPU CUDA DDP via PyTorch Lightning.\n- [`reproduction\u002Felf_b-owt\u002F`](reproduction\u002Felf_b-owt\u002F): config snapshot, 1000 final\n samples, and per-epoch samples. The\n  checkpoint is hosted separately (see Quickstart).\n\n## Quickstart — evaluate the reproduced checkpoint\n\n```bash\n# 1. Environment (conda)\nconda env create -f environment.yml -n elf-pytorch && conda activate elf-pytorch\n\n# 2. Download the reproduced final EMA checkpoint (1.4 GB)\npip install huggingface_hub\nhuggingface-cli download Ugness\u002Felf-torch last.ckpt \\\n    --local-dir reproduction\u002Felf_b-owt\u002F\n\n# 3. Run the 1000-sample evaluation\ncd pytorch_lightning\u002F\ntorchrun --nproc_per_node=8 --master_port=29510 eval_lightning.py \\\n    --config configs\u002Ftraining_configs\u002Ftrain_owt_ELF-B.yml \\\n    --checkpoint_path ..\u002Freproduction\u002Felf_b-owt\u002Flast.ckpt \\\n    --num_samples 1000\n# Expected: Gen. PPL ≈ 25.6, sample entropy ≈ 5.20.\n```\n\n### Per-epoch checkpoints\n\nThe checkpoints are under this HF repo:\n[`checkpoints\u002F`](https:\u002F\u002Fhuggingface.co\u002FUgness\u002Felf-torch\u002Ftree\u002Fmain\u002Fcheckpoints).\n```bash\n# Example: pull epoch 4 ckpt.\nhuggingface-cli download Ugness\u002Felf-torch \\\n    checkpoints\u002Fcheckpoint_epoch03_step00152136.ckpt \\\n    --local-dir reproduction\u002Felf_b-owt\u002F\n```\n\n## Quickstart — train from scratch\n\n```bash\ncd pytorch_lightning\u002F\ntorchrun --nproc_per_node=8 --master_port=29501 train_lightning.py \\\n    --config configs\u002Ftraining_configs\u002Ftrain_owt_ELF-B.yml\n```\n\n## Reproduction details\n\n- **Hardware:** 8× NVIDIA B200 (sm_100), CUDA 12.8.\n  `broadcast_buffers=False`. See `pytorch_lightning\u002Ftrain_lightning.py`.\n- **Wall-clock:** ~3 hours per epoch.\n\n\n### Differences vs the paper run\n\n| Aspect | Paper | This reproduction |\n| --- | --- | --- |\n| Hardware | TPU v5p-64 | 8× B200 DDP |\n| Framework | JAX\u002FFlax | PyTorch Lightning |\n| Epochs | 5 | 6 (one extra to reach entropy ≈ 5.20) |\n| Optimizer \u002F objective | Muon + L2 denoise + CE decode (decoder_prob=0.2) | Unchanged |\n| Schedule, noise scale, time schedule, SC, CFG | Unchanged | Unchanged |\n","ELF-pytorch 是一个非官方的 PyTorch 实现项目，旨在复现 ELF: Embedded Language Flows 模型在 OpenWebText 数据集上的表现。该项目使用了 T5 词符器进行数据预处理，并通过 PyTorch Lightning 在多 GPU 环境下实现了模型训练与评估。核心功能包括基于 SDE 的生成模型训练、以及针对不同训练阶段的性能指标记录。适合需要研究或应用语言模型生成技术的研究人员和开发者使用，特别是那些希望在 PyTorch 生态系统内探索 ELF 模型能力的人士。注意，由于预处理方法的不同，该项目的结果与原始论文中的基线可能不具备直接可比性。",2,"2026-06-11 04:03:10","CREATED_QUERY"]