[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80865":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},80865,"mosaic","maxxxzdn\u002Fmosaic","maxxxzdn","(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models [ICML'26]","https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.16429",null,"Python",40,1,35,0,3,4,5,9,0.9,false,"main",true,[],"2026-06-12 02:04:07","\u003Cp align=\"center\">\n  \u003Cimg src=\"figures_weather\u002Fmosaic_header.jpg\" alt=\"Mosaic\" width=\"70%\">\n\u003C\u002Fp>\n\n# Mosaic — Block-Sparse Attention for Weather Forecasting\n\n|  📄 [**Paper**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.16429)  |  🌐 [**Live Demo**](https:\u002F\u002Fmaxxxzdn-mosaic.static.hf.space\u002F)  |  🤗 [**Hugging Face**](https:\u002F\u002Fhuggingface.co\u002Fmaxxxzdn\u002Fmosaic)  |  💻 [**GitHub**](https:\u002F\u002Fgithub.com\u002Fmaxxxzdn\u002Fmosaic)  |\n| :---: | :---: | :---: | :---: |\n| ICML 2026 · arXiv:2604.16429 | Interactive forecasts & spectra | Pretrained weights & model card | Source code & issue tracker |\n\n> **(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models** \\\n> Maksim Zhdanov, Ana Lucic, Max Welling, Jan-Willem van de Meent · *ICML 2026*\n\n**Mosaic** is a probabilistic weather forecasting model that operates on native-resolution grids via mesh-aligned block-sparse attention. At 1.5° resolution with 214M parameters, Mosaic matches or outperforms models trained on 6× finer resolution on key variables, and individual ensemble members exhibit near-perfect spectral alignment across all resolved frequencies. A 24-member, 10-day forecast takes under 12 s on a single H100 GPU.\n\n![Spectral fidelity and skill–speed Pareto](figures_weather\u002Fresults_spectra_pareto.jpg)\n\n## Interactive Demo\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fmaxxxzdn-mosaic.static.hf.space\u002F\">\n    \u003Cimg src=\"figures_weather\u002Fdemo_preview.jpg\" alt=\"Mosaic interactive demo\" width=\"85%\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n[**▶ Explore forecasts in your browser**](https:\u002F\u002Fmaxxxzdn-mosaic.static.hf.space\u002F) — rotate the globe, step through a 10-day 16-member ensemble forecast, switch variables, and watch each member's kinetic-energy spectrum track ERA5 in real time.\n\n## TL;DR\n\nMosaic addresses two distinct failure modes of spectral degradation in ML-based weather prediction:\n\n1. **Spectral damping** caused by deterministic training against ensemble means. Mosaic addresses this with learned functional perturbations that produce ensemble members preserving realistic spectral variability.\n2. **High-frequency aliasing** caused by compressive encoding onto a coarse latent grid. Mosaic operates at native resolution via block-sparse attention before any coarsening, eliminating the compress-first bottleneck.\n\nThe block-sparse attention captures long-range dependencies at **linear** cost by sharing keys and values across spatially adjacent queries arranged on the HEALPix mesh. Each query block jointly selects which key blocks to attend to.\n\n## Published Variants\n\nThis repository ships **two trained variants**, distinguished primarily by the data they were tuned on. They share the same Mosaic architecture and 82-channel variable set, but differ in training data, time cadence, history length, and normalization statistics.\n\n| Variant | Training data | Native step | Input history | k-neighbours | Suggested input zarr |\n|---------|---------------|-------------|---------------|---------------|---------------------|\n| `era5`  | ERA5 reanalysis only       | 24 h | 2 states (48 h) | 24 | WB2 ERA5 1.5° |\n| `hres`  | ERA5 pretrain + HRES finetune | 6 h  | 4 states (24 h) | 20 | WB2 HRES-fc0 1.5° |\n\nChoose `era5` when initializing from reanalysis (matches the training distribution); choose `hres` when initializing from HRES analysis or a similar operational state.\n\n## Architecture\n\n**Inputs.** 82 atmospheric channels at 1.5° equiangular resolution (240 lon × 121 lat = 29 040 points) plus 3 static channels and sinusoidal day\u002Fyear time encodings.\n\n**Backbone.** A U-Net of transformer blocks over the HEALPix mesh, where spatial neighbours occupy contiguous memory and queries can be grouped into hardware-aligned blocks:\n\n| Stage      | Nside | Hidden dim | Heads | Enc \u002F Dec depth |\n|------------|------:|----------:|------:|----------------:|\n| Stage 1    | 64    | 768       | 12    | 4 \u002F 2 |\n| Stage 2    | 32    | 1024      | 16    | 4 \u002F 2 |\n| Bottleneck | 16    | 1280      | 20    | 2     |\n\nGrouped-Query Attention with ratio 4 (3 KV heads per stage), 2D RoPE on (longitude, latitude), and additive noise injection in SwiGLU gates for ensemble generation. ~214M parameters total.\n\n![Block-sparse attention for weather forecasting](figures_weather\u002Fbsa.jpg)\n\n**Block-sparse attention.** Three branches combined by learned gates: (i) **compression** — block-to-block coarse attention captures broad synoptic patterns at $\\mathcal{O}(N^2\u002Fb^2)$; (ii) **selection** — each query block top-k-selects fine-scale key blocks at $\\mathcal{O}(Nnb)$; (iii) **local** — full attention inside each block at $\\mathcal{O}(Nb)$. Spatially close points occupy contiguous memory on the HEALPix mesh, enabling coalesced GPU reads and hardware-aligned block computation. Implemented as a single Triton kernel; in practice up to **61.8× faster than dense attention** and **9.4× faster than NSA**.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"figures_weather\u002Fhealpix.jpg\" alt=\"HEALPix mesh\" width=\"48%\">\n  \u003Cimg src=\"figures_weather\u002Fbsa_runtime.jpg\" alt=\"Runtime scaling\" width=\"48%\">\n\u003C\u002Fp>\n\n### Variables (82 channels)\n\n- **Surface (4):** `2m_temperature`, `10m_u_component_of_wind`, `10m_v_component_of_wind`, `mean_sea_level_pressure`\n- **Pressure level (6 × 13 = 78):** `geopotential`, `specific_humidity`, `temperature`, `u_component_of_wind`, `v_component_of_wind`, `vertical_velocity` at levels [50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000] hPa\n- **Static (3, conditioning only — not in output):** `geopotential_at_surface`, `land_sea_mask`, `soil_type`\n\n## Results\n\nThe headline figure at the top of the page summarizes the main result: individual ensemble members preserve realistic kinetic-energy spectra (left, 1.5°; centre, 0.25°), and Mosaic sits on the favourable end of the skill–speed–memory Pareto (right). You can explore the same forecasts and spectra interactively in the [live demo](https:\u002F\u002Fmaxxxzdn-mosaic.static.hf.space\u002F). All metrics computed at 240 h lead time, 720 initial conditions throughout the 2020 test year (1.5° benchmark) and a single 6 h forecast (0.25° benchmark).\n\nOn the 0.25° HRES benchmark, Mosaic competes with state-of-the-art 0.25° models despite operating at 1.5°:\n\n![HRES benchmark results](figures_weather\u002Fresults_hres.jpg)\n\nAnd a case study of **Hurricane Ian (2022)** — Mosaic's ensemble correctly brackets the observed track 7 days ahead, with progressive narrowing of spread as lead time decreases:\n\n![Hurricane Ian ensemble tracks](figures_weather\u002Fhurricane_tracking.jpg)\n\nSee the paper for full benchmark tables, CRPS curves, and spread-to-skill ratios.\n\n## Hardware Requirements\n\n- **GPU:** any CUDA GPU; 16 GB is enough for a 1-member rollout, A100\u002FH100 recommended for multi-member ensembles\n- **Memory:** ~9 GB GPU RAM for a 1-member, 40-step (10-day) rollout in float16\n- **Throughput:** 24-member, 10-day forecast in under 12 s on a single H100\n- **CUDA:** 11.8+ with matching `triton` and `flash-attn` versions\n\n## Installation\n\n```bash\npip install -r requirements.txt\npip install flash-attn --no-build-isolation   # built separately; needs nvcc\n```\n\nFor reading data from Google Cloud Storage (WeatherBench2 zarr stores):\n\n```bash\npip install gcsfs\n```\n\n## Getting the weights\n\nIf you installed via Hugging Face (`huggingface-cli download maxxxzdn\u002Fmosaic --local-dir .`), the checkpoints (`checkpoints\u002Fera5_best.pt`, `checkpoints\u002Fhres_best.pt`) and normalization stats (`data\u002F*.npz`) are already in place and `inference.py` finds them automatically.\n\nIf you cloned this repo from GitHub instead, the weights are not in the git tree (they live on the [Hugging Face mirror](https:\u002F\u002Fhuggingface.co\u002Fmaxxxzdn\u002Fmosaic) as LFS objects). Fetch them with:\n\n```bash\npip install huggingface_hub\nhuggingface-cli download maxxxzdn\u002Fmosaic --local-dir .   # pulls .pt + .npz assets\n```\n\nor programmatically:\n\n```python\nfrom huggingface_hub import snapshot_download\nsnapshot_download(repo_id=\"maxxxzdn\u002Fmosaic\", local_dir=\".\")\n```\n\n## Quick Start\n\n```bash\n# ERA5 variant — 10-day forecast at 24 h resolution from ERA5 reanalysis\npython inference.py --variant era5 \\\n    --zarr gs:\u002F\u002Fweatherbench2\u002Fdatasets\u002Fera5\u002F1959-2023_01_10-6h-240x121_equiangular_with_poles_conservative.zarr \\\n    --init-time \"2020-01-01T00:00\" \\\n    --steps 10 --members 1 \\\n    --output forecast_era5.npz\n\n# HRES variant — 10-day forecast at 6 h resolution from HRES initial conditions\npython inference.py --variant hres \\\n    --zarr gs:\u002F\u002Fweatherbench2\u002Fdatasets\u002Fhres_t0\u002F2016-2022-6h-240x121_equiangular_with_poles_conservative.zarr \\\n    --init-time \"2022-01-01T00:00\" \\\n    --steps 40 --members 1 \\\n    --output forecast_hres.npz\n\n# Ensemble forecast (16 members) — change --members\npython inference.py --variant hres --zarr \u003C...> --init-time \"2020-06-15T12:00\" \\\n    --steps 40 --members 16 --output ensemble.npz\n```\n\n`--variant` selects the checkpoint, normalization statistics, history length, time stride, and neighbour count automatically. Pass `--checkpoint` or `--norm-stats` to override the bundled defaults.\n\n## Output Format\n\nThe output `.npz` file contains:\n\n| Array | Shape | Description |\n|-------|-------|-------------|\n| `forecasts` | `(members, steps, 240, 121, 82)` | Predicted states in physical units |\n| `variables` | `(82,)` | Variable names |\n| `lead_time_hours` | `(steps,)` | Lead times (era5: 24, 48, …; hres: 6, 12, …) |\n| `init_time` | scalar | Initialization timestamp |\n| `longitude` | `(240,)` | Longitude values (0 to 358.5°) |\n| `latitude` | `(121,)` | Latitude values, South→North (−90 to 90°) |\n\n### Reading the output\n\n```python\nimport numpy as np\n\ndata = np.load(\"forecast_era5.npz\", allow_pickle=True)\nforecasts = data['forecasts']                          # (members, steps, 240, 121, 82)\nvariables = list(data['variables'])                    # ['2m_temperature', ...]\nlead_hours = data['lead_time_hours']                   # e.g. [24, 48, ..., 240]\n\n# Extract 500 hPa geopotential at 24 h lead time\nz500_idx = variables.index('geopotential_500')\ni_24h = int(np.where(lead_hours == 24)[0][0])\nz500_24h = forecasts[0, i_24h, :, :, z500_idx]         # (240, 121) lon × lat\n```\n\n## Input Data Format\n\nThe model accepts ERA5 or HRES data in zarr format at 1.5° resolution with:\n\n- **Grid:** 240 lon × 121 lat equiangular with poles\n- **Time:** 6-hourly timesteps (integer hours since an arbitrary origin, parsed from the `units` zarr attribute, or as datetime64)\n- **Variables:** all 10 atmospheric variables listed above; per-variable layout is auto-detected from `_ARRAY_DIMENSIONS` (either `(time, latitude, longitude)` or `(time, longitude, latitude)`), and latitude is flipped if stored North→South\n\nCompatible zarr stores from [WeatherBench2](https:\u002F\u002Fweatherbench2.readthedocs.io\u002F):\n\n```\ngs:\u002F\u002Fweatherbench2\u002Fdatasets\u002Fera5\u002F1959-2023_01_10-6h-240x121_equiangular_with_poles_conservative.zarr\ngs:\u002F\u002Fweatherbench2\u002Fdatasets\u002Fhres_t0\u002F2016-2022-6h-240x121_equiangular_with_poles_conservative.zarr\n```\n\n## Repository Contents\n\n| File | Description |\n|------|-------------|\n| `inference.py` | Main inference script (variant-aware via `--variant {era5,hres}`) |\n| `mosaic.py` | Mosaic U-Net transformer |\n| `primitives.py` | Attention blocks, RoPE, HEALPix sampling, noise generator |\n| `ops.py` | Triton block-sparse attention kernels |\n| `utils.py` | HEALPix grid utilities |\n| `base.py` | `WeatherModel` wrapper |\n| `config.py` | Variable \u002F level definitions |\n| `dataset.py` | Metadata dataclasses |\n| `data\u002Fnorm_stats_era5.npz` | Normalization statistics for the `era5` variant |\n| `data\u002Fnorm_stats_hres.npz` | Normalization statistics for the `hres` variant |\n| `data\u002Fstatic_vars.npz` | Static fields (orography, land–sea mask, soil type) — shared between variants |\n| `checkpoints\u002Fera5_best.pt` | Trained checkpoint, `era5` variant (~1.7 GB) — Hugging Face only |\n| `checkpoints\u002Fhres_best.pt` | Trained checkpoint, `hres` variant (~1.7 GB) — Hugging Face only |\n| `figures_weather\u002F` | Figures from the paper |\n\n## Limitations\n\nMosaic operates at 1.5° (~166 km), which cannot resolve mesoscale phenomena such as tropical-cyclone inner-core structure or individual severe thunderstorms. The block-sparse attention is designed to scale linearly with sequence length, so finer grids (e.g. 0.25°, ~700k tokens) are a natural next step but are not part of this release.\n\n## Model card metadata\n\n|         |   |\n|---------|---|\n| License | [`cc-by-nc-4.0`](https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002F) |\n| Library | `pytorch` |\n| Tags    | `weather` · `weather-forecasting` · `climate` · `atmospheric-science` · `sparse-attention` · `transformer` · `probabilistic-forecasting` |\n\n## License\n\nReleased under [CC-BY-NC-4.0](https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002F). Free for non-commercial research and educational use with attribution; commercial use requires a separate license. Underlying training data (ERA5, HRES) is subject to its own licensing terms set by ECMWF.\n\n## Acknowledgements\n\nMZ acknowledges support from Microsoft Research AI4Science. JWvdM acknowledges support from the European Union Horizon Framework Programme (Grant agreement ID: 101120237). This work used the Dutch national e-infrastructure with the support of the SURF Cooperative using grant no. EINF-16923. Computations were partially performed using the UvA\u002FFNWI HPC Facility.\n\n## Citation\n\nIf you use Mosaic, please cite:\n\n```bibtex\n@inproceedings{zhdanov2026mosaic,\n  title     = {(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models},\n  author    = {Zhdanov, Maksim and Lucic, Ana and Welling, Max and van de Meent, Jan-Willem},\n  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},\n  year      = {2026},\n  url       = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.16429}\n}\n```\n","Mosaic 是一个基于机器学习的天气预报模型，通过网格对齐的块稀疏注意力机制在原生分辨率网格上运行。该项目的核心功能包括在1.5°分辨率下利用2.14亿参数进行高效预测，同时保持频谱保真度，并且单个H100 GPU上24成员10天的预报耗时不到12秒。技术特点方面，Mosaic 通过学习的功能扰动解决了频谱衰减问题，并采用块稀疏注意力消除了高频混叠现象。该模型适合用于需要高精度和快速响应的天气预报场景，尤其是在处理大规模气象数据集时能够显著提高预测准确性和计算效率。",2,"2026-06-11 04:02:37","CREATED_QUERY"]