[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75712":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},75712,"ELF","lillian039\u002FELF","lillian039",null,"Python",867,59,7,6,0,27,741,23,9.33,"MIT License",false,"main",[],"2026-06-12 02:03:35","# ELF: Embedded Language Flows\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.10938-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.10938)&nbsp;\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)&nbsp;\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-ELF-blue.svg)](https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows)&nbsp;\n\nThis is the official JAX implementation for the paper *ELF: Embedded Language Flows*. This code is written and tested on TPUs. A PyTorch version is available on the [`pytorch_elf`](https:\u002F\u002Fgithub.com\u002Flillian039\u002FELF\u002Ftree\u002Fpytorch_elf) branch.\n\nELF is a class of continuous diffusion language models based on continuous-time Flow Matching. Unlike existing DLMs, ELF predominantly stays within the continuous embedding space until the final time step, where it maps to discrete tokens using a shared-weight network. This formulation makes it straightforward to adapt established techniques from image-domain diffusion models, e.g., classifier-free guidance (CFG).\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.gif\" alt=\"Conceptual illustration of ELF\" width=\"100%\"\u002F>\n\u003C\u002Fp>\n\u003Cp align=\"left\">\n  \u003Cem>\u003Cstrong>Conceptual illustration of ELF.\u003C\u002Fstrong> Orange points denote data represented in continuous embedding space, and purple lines show denoising trajectories from Gaussian noise to clean embeddings. Discretization is applied only at the final time step (t=1) using a shared-weight network.\u003C\u002Fem>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fgeneration.gif\" alt=\"Denoising trajectory of ELF-B\" width=\"100%\"\u002F>\n\u003C\u002Fp>\n\u003Cp align=\"left\">\n  \u003Cem>\u003Cstrong>Denoising trajectory\u003C\u002Fstrong> of ELF-B. As t increases from 0 to 1, ungrammatical sentences are progressively refined into fluent and grammatical text.\u003C\u002Fem>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fsys_compare.jpg\" alt=\"System-level comparison\" width=\"100%\"\u002F>\n\u003C\u002Fp>\n\u003Cp align=\"left\">\n  \u003Cem>\u003Cstrong>System-level comparison.\u003C\u002Fstrong> ELF-B outperforms both discrete and continuous DLMs trained under similar settings (a) and distilled variants of other baselines that require additional rounds of training (b), and uses substantially fewer training tokens (c).\u003C\u002Fem>\n\u003C\u002Fp>\n\n## Initialization\n\nInstall the dependencies (JAX+TPUs) and log in to WandB to track your experiments if needed.\n\n```bash\npip install -r requirements.txt\nwandb login YOUR_WANDB_API_KEY\n```\n\n## Inference\n\nYou can quickly verify your setup with our provided checkpoint.\n\n\u003Ctable>\u003Ctbody>\n\u003Ctd valign=\"bottom\">OpenWebText (unconditional)\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\" align=\"center\">ELF-B (105M)\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\" align=\"center\">ELF-M (342M)\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\" align=\"center\">ELF-L (652M)\u003C\u002Ftd>\n\u003Ctr>\u003Ctd align=\"left\">pre-trained checkpoint\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows\u002FELF-B-owt\">ELF-B-owt\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows\u002FELF-M-owt\">ELF-M-owt\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows\u002FELF-L-owt\">ELF-L-owt\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"left\">Sampling steps (SDE)\u003C\u002Ftd>\n\u003Ctd align=\"center\">32\u003C\u002Ftd>\n\u003Ctd align=\"center\">64\u003C\u002Ftd>\n\u003Ctd align=\"center\">64\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"left\">Gen. PPL ↓ (paper)\u003C\u002Ftd>\n\u003Ctd align=\"center\">24.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">23.3\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"left\">Entropy ↑ (paper)\u003C\u002Ftd>\n\u003Ctd align=\"center\">5.15\u003C\u002Ftd>\n\u003Ctd align=\"center\">5.18\u003C\u002Ftd>\n\u003Ctd align=\"center\">5.28\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\n\u003Ctable>\u003Ctbody>\n\u003Ctd valign=\"bottom\">Conditional generation (ELF-B)\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\" align=\"center\">WMT14 De-En\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\" align=\"center\" colspan=\"3\">XSum\u003C\u002Ftd>\n\u003Ctr>\u003Ctd align=\"left\">pre-trained checkpoint\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows\u002FELF-B-de-en\">ELF-B-de-en\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\" colspan=\"3\">\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows\u002FELF-B-xsum\">ELF-B-xsum\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"left\">Metric\u003C\u002Ftd>\n\u003Ctd align=\"center\">BLEU ↑\u003C\u002Ftd>\n\u003Ctd align=\"center\">ROUGE-1 ↑\u003C\u002Ftd>\n\u003Ctd align=\"center\">ROUGE-2 ↑\u003C\u002Ftd>\n\u003Ctd align=\"center\">ROUGE-L ↑\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"left\">Score (paper)\u003C\u002Ftd>\n\u003Ctd align=\"center\">26.4\u003C\u002Ftd>\n\u003Ctd align=\"center\">36.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">12.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">27.8\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\nSlight differences in metrics may arise from different compute setups. Our results were computed on TPU v5p-64.\n\n#### Sanity Check\n\n1. **Get the checkpoint.** All pre-trained checkpoints are on HuggingFace under [`embedded-language-flows`](https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows) and are pulled automatically via `--checkpoint_path \u003Chf-repo-id>` — no manual download needed. To use a locally trained checkpoint, pass the path to the specific checkpoint file, e.g. `--checkpoint_path outputs\u002Felf_b-owt\u002Fcheckpoint_19000`.\n\n2. **(Optional) Tweak the config.** The provided `configs\u002Ftraining_configs\u002Ftrain_owt_ELF-{B,M,L}.yml` already point at the correct HuggingFace data + T5 encoder, so they run as-is. You may want to edit:\n    - `output_dir` — where samples and logs are written\n    - `wandb_entity` — set to your entity, or set `use_wandb: false` to disable\n    - `sampling_configs_path` — defaults to `configs\u002Fsampling_configs\u002Funcond_sampling_configs.yml` (32-step SDE + 64-step SDE, both with self-conditioning CFG); swap for your preferred schedule if needed\n\n3. **Launch evaluation.**\n\n**Unconditional generation:**\n```bash\ncd src\u002F\n\n  # ELF-B (105M)\n  python eval.py \\\n      --config configs\u002Ftraining_configs\u002Ftrain_owt_ELF-B.yml \\\n      --checkpoint_path embedded-language-flows\u002FELF-B-owt\n\n  # ELF-M (342M) — smaller batch to fit the bigger model\n  python eval.py \\\n      --config configs\u002Ftraining_configs\u002Ftrain_owt_ELF-M.yml \\\n      --checkpoint_path embedded-language-flows\u002FELF-M-owt \\\n      --config_override global_batch_size=64\n\n  # ELF-L (652M)\n  python eval.py \\\n      --config configs\u002Ftraining_configs\u002Ftrain_owt_ELF-L.yml \\\n      --checkpoint_path embedded-language-flows\u002FELF-L-owt \\\n      --config_override global_batch_size=64\n```\nThe evaluator generates 1,000 samples and reports Gen. PPL (under a pretrained GPT-2 Large) and unigram entropy. Expected: Gen. PPL ≈ 24 and entropy ≈ 5.15 for ELF-B at 32 SDE steps.\n\n**Conditional generation:**\n```bash\ncd src\u002F\n\n# XSum (summarization)\npython eval.py \\\n    --config configs\u002Ftraining_configs\u002Ftrain_xsum_ELF-B.yml \\\n    --checkpoint_path embedded-language-flows\u002FELF-B-xsum\n\n# WMT14 De-En (translation)\npython eval.py \\\n    --config configs\u002Ftraining_configs\u002Ftrain_de-en_ELF-B.yml \\\n    --checkpoint_path embedded-language-flows\u002FELF-B-de-en\n```\nThe evaluator runs on each task's **validation** set and reports BLEU for WMT14 De-En and ROUGE-1\u002F2\u002FL for XSum. Expected: BLEU ≈ 26.7 on De-En; ROUGE-1\u002F2\u002FL ≈ 36.3 \u002F 12.5 \u002F 28.1 on XSum. Note that the paper numbers are computed on the **test** sets, so validation scores here may differ slightly.\n\n## Data Preparation\n\nThree task settings: unconditional generation on **OpenWebText**, machine translation on **WMT14 De-En**, and summarization on **XSum**. All use a frozen T5 encoder for text-to-embedding mapping.\n\n#### Pre-tokenized splits\n\nWe provide pre-tokenized splits (T5 tokenizer) and the JAX T5-small encoder on HuggingFace under [`embedded-language-flows`](https:\u002F\u002Fhuggingface.co\u002Fembedded-language-flows). They are loaded directly via `datasets.load_dataset` — no manual download needed. Defaults wired into the configs:\n\n| Task | `data_path` \u002F `eval_data_path` |\n| --- | --- |\n| OpenWebText | `embedded-language-flows\u002Fopenwebtext-t5` |\n| WMT14 De-En | `embedded-language-flows\u002Fwmt14_de-en_{train,validation}_t5` |\n| XSum | `embedded-language-flows\u002Fxsum_{train,validation}_t5` |\n| T5 encoder | `embedded-language-flows\u002Ft5_small_encoder_jax\u002Ft5_small_encoder_jax.pkl` |\n\nTo use a local copy, point `data_path` at a directory saved with `datasets.save_to_disk` — the loader falls back to `load_from_disk`.\n\n#### Prepare your own data\n\nTo train on a custom dataset, pre-tokenize it with the tokenizer and save it as a HuggingFace `Dataset` (Arrow).\n\n**Unconditional generation** (e.g., OWT): each example needs only `input_ids` — the token ids of the text to be generated.\n\n**Conditional generation** (e.g., translation, summarization): each example needs both `input_ids` (target\u002Foutput text) and `condition_input_ids` (source\u002Finput text, e.g., the German sentence or the article). The collator prepends `condition_input_ids` to `input_ids` and builds the appropriate attention masks automatically.\n\nMinimal recipe:\n\n```python\nfrom datasets import Dataset\nfrom transformers import T5Tokenizer\n\ntok = T5Tokenizer.from_pretrained(\"google-t5\u002Ft5-small\")\n\n# Unconditional\ndef encode_uncond(ex):\n    return {\"input_ids\": tok(ex[\"text\"], add_special_tokens=False)[\"input_ids\"]}\n\n# Conditional (translation \u002F summarization)\ndef encode_cond(ex):\n    return {\n        \"condition_input_ids\": tok(ex[\"source\"], add_special_tokens=False)[\"input_ids\"],\n        \"input_ids\": tok(ex[\"target\"], add_special_tokens=False)[\"input_ids\"],\n    }\n\nds = Dataset.from_list(my_examples).map(encode_uncond, remove_columns=...)  # or encode_cond\nds.save_to_disk(\"\u002Fpath\u002Fto\u002Fmy_dataset\")\n```\n\nThen point your config at it:\n\n```yaml\ndata_path: \u002Fpath\u002Fto\u002Fmy_dataset\neval_data_path: \u002Fpath\u002Fto\u002Fmy_eval_dataset   # optional\n```\n\nFor evaluation-only JSONL inputs (raw text, tokenized at load time), see `load_jsonl_dataset` in [data_utils.py:110-130](src\u002Futils\u002Fdata_utils.py#L110-L130) — set `eval_data_path` to a `.jsonl` file with one `{\"input\": ..., \"output\": ...}` example per line.\n\n## Training\n\nRun the following command to launch training:\n\n```bash\npython train.py --config configs\u002Ftraining_configs\u002Ftrain_owt_ELF-B.yml\n```\n\nAvailable training configs:\n\n- `configs\u002Ftraining_configs\u002Ftrain_owt_ELF-B.yml` — unconditional generation on OpenWebText, ELF-B (default)\n- `configs\u002Ftraining_configs\u002Ftrain_owt_ELF-M.yml` — unconditional generation on OpenWebText, ELF-M\n- `configs\u002Ftraining_configs\u002Ftrain_owt_ELF-L.yml` — unconditional generation on OpenWebText, ELF-L\n- `configs\u002Ftraining_configs\u002Ftrain_de-en_ELF-B.yml` — WMT14 De-En machine translation\n- `configs\u002Ftraining_configs\u002Ftrain_xsum_ELF-B.yml` — XSum abstractive summarization\n\nDefault ELF-B training uses Muon at blr=0.001 (base learning rate; effective lr = blr × batch_size \u002F 256 = 0.002 at the default batch size of 512), global batch size 512, and runs 5 epochs on OWT (~95K steps) on TPU v5p-64 (~1.5 h per epoch).\n\n#### Config System\n\nThe training system uses two config layers:\n\n- **`configs\u002Fconfig.py`** — base `Config` dataclass with all default hyperparameters\n- **`configs\u002Ftraining_configs\u002F*.yml`** — task-specific overrides loaded by `load_config_from_yaml()`\n\nThe system merges these, allowing you to customize only the parameters you need.\n\n#### Customizing Training\n\nTo create a custom experiment:\n\n1. **Create a new config file** (e.g., `configs\u002Ftraining_configs\u002Fmy_exp.yml`)\n2. **Launch with your config:**\n   ```bash\n   python train.py --config configs\u002Ftraining_configs\u002Fmy_exp.yml\n   ```\n\n**Example custom config:**\n\n```yaml\nmodel: ELF-M                # Use ELF-M model (342M)\n\nepochs: 4\nglobal_batch_size: 512\nblr: 0.002\noptimizer: muon\n\ndenoiser_p_mean: -1.5       # Logit-normal time schedule\ndenoiser_p_std: 0.8\ndenoiser_noise_scale: 2.0\nself_cond_prob: 0.5\ndecoder_prob: 0.2           # 20% decoding (CE) \u002F 80% denoising (L2)\n```\n\nFor more details on configuration options, refer to `config.py` and the YAML files under `configs\u002Ftraining_configs\u002F`.\n\n#### Sampling Configuration\n\nSampling is decoupled from training and is controlled by a separate YAML in `configs\u002Fsampling_configs\u002F`, referenced from each training config via `sampling_configs_path`:\n\n- `uncond_sampling_configs.yml` — unconditional generation: two SDE schedules, 32-step (γ=1.5) and 64-step (γ=1.0), both with SC-CFG=3\n- `cond_sampling_configs.yml` — conditional generation (translation \u002F summarization): one 64-step ODE schedule with CFG=2 and SC-CFG=1\n\nEach list entry specifies a sampler (`ode` \u002F `sde`), `num_sampling_steps`, `cfgs`, `self_cond_cfg_scales`, and `time_schedule`. The evaluator iterates through all entries.\n\n## Checkpointing\n\nCheckpoints are saved at the end of each epoch (or at fractional intervals if `save_freq \u003C 1`) to `output_dir\u002Fcheckpoint_\u003Cstep>`, keeping up to 10 recent checkpoints. Only process 0 writes to disk.\n\nIf `hf_repo_id` is set in the config, the entire `output_dir` is uploaded to HuggingFace after each save.\n\n**Auto-resume:** if `--resume` is not specified, training automatically detects and resumes from the latest checkpoint in `output_dir`.\n\n**Loading:** `load_checkpoint` accepts a local path or an HF repo ID (e.g., `embedded-language-flows\u002FELF-B-owt`). For a directory, it uses the latest checkpoint inside.\n\nThe T5 encoder weights (`encoder_checkpoint`) are stored separately as a `.pkl` file and loaded once at startup. They can also be specified as an HF path (default: `embedded-language-flows\u002Ft5_small_encoder_jax\u002Ft5_small_encoder_jax.pkl`).\n\n## License\n\nThis repo is under the MIT license. See [LICENSE](LICENSE) for details.\n\n## Citation\n\nIf you find this work useful in your research, please consider citing our paper :)\n\n```bib\n@article{elf2026,\n  title={ELF: Embedded Language Flows},\n  author={Hu, Keya and Qiu, Linlu and Lu, Yiyang and Zhao, Hanhong and Li, Tianhong and Kim, Yoon and Andreas, Jacob and He, Kaiming},\n  journal={arXiv preprint arXiv:2605.10938},\n  year={2026}\n}\n```\n\n## Acknowledgement\n\nWe gratefully acknowledge the Google TPU Research Cloud (TRC) for granting TPU access.\nWe hope this work will serve as a useful resource for the open-source community.\n","ELF 是一种基于连续时间流匹配的连续扩散语言模型。该项目使用 JAX 实现，并在 TPU 上进行了测试，主要特点是能够在连续嵌入空间中保持处理直到最后一步才映射到离散标记，这使得从图像域扩散模型借鉴技术变得简单，如无分类器指导（CFG）。项目提供了不同规模的预训练模型（105M、342M 和 652M 参数），适用于需要高质量文本生成的场景，特别是在资源有限的情况下仍能表现出色。此外，还提供了一个 PyTorch 版本分支以增加兼容性。",2,"2026-06-11 03:53:08","CREATED_QUERY"]