[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82222":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":14,"starSnapshotCount":14,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},82222,"machine-learning-library","ATOM00blue\u002Fmachine-learning-library","ATOM00blue","A hand-curated, topic-organized library of the best ML education — 923 docs (391 arXiv papers, 474 Stanford\u002FMIT\u002FKarpathy\u002Ffast.ai lectures, 58 explainer articles), normalized to Markdown with full provenance. Open it in Obsidian or point your agent at it. A clean ML corpus for learning, RAG & fine-tuning.",null,"Python",126,13,1,0,7,20,6,3.44,"Other",false,"main",true,[24,25,26,27,28,29,30,31,32,33],"arxiv","corpus","dataset","deep-learning","education","llm","machine-learning","nlp","study-resources","transformers","2026-06-12 02:04:24","# Machine Learning Library\n\n**A hand-curated, machine-readable library (a curated ML corpus \u002F dataset) of the best machine-learning education on the internet — top university courses, canonical research papers, and the most-cited explainer blogs — normalized into one consistent Markdown format with full provenance.**\n\n923 documents · ~11 million tokens · beginner to frontier (2026) research · every source credited.\n\n![docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocuments-923-blue)\n![tokens](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ftokens-~11M-green)\n![papers](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv_papers-391-orange)\n![lectures](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flectures-474-red)\n![articles](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farticles-58-purple)\n![topics](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ftopics-17-blueviolet)\n\n> **🆕 Now topic-organized, Obsidian-ready, and agent-ready.** Every doc is tagged into a 17-topic map ([`atlas\u002F`](atlas\u002F)); open the folder as a turnkey **Obsidian vault** (bundled config + graph), or point **Claude Code \u002F Cursor \u002F any agent** at it and it answers ML questions citing real papers and lectures. → [**Open in Obsidian \u002F connect your agent**](#open-in-obsidian--connect-your-agent)\n\n---\n\n## Why this exists\n\nThe best material for learning machine learning is scattered across dozens of course pages, YouTube channels, arXiv PDFs, and personal blogs — each in a different format, none of it easy to search, embed, or feed to a model.\n\nThis repo pulls the highest-signal sources into **one place**, in **one consistent format**, with **clean metadata on every file**. The curation is the point: instead of an undifferentiated dump of arXiv or a noisy web scrape, this is a deliberately chosen reading list spanning the whole field — from \"what is a neural network\" all the way to sparse-attention and reasoning-model papers from 2025.\n\nIt's designed to be **used by both humans and machines**: read it directly to learn, or drop it into a vector database to build a retrieval-augmented tutor, fine-tune a domain model, or benchmark embeddings.\n\n---\n\n## At a glance\n\n| | |\n|---|---|\n| **Total documents** | 923 |\n| **Total size** | ~42M characters (~11M tokens) |\n| **arXiv papers** | 391 (78 full-text + 313 recent abstract+metadata) |\n| **Lecture transcripts** | 474 (across 14 courses\u002Fchannels) |\n| **Web articles** | 58 (canonical explainers) |\n| **Topic tags** | 17-topic controlled vocabulary (+ level \u002F medium \u002F task \u002F technique facets) |\n| **Format** | Markdown + YAML frontmatter |\n| **Coverage** | Intro fundamentals → frontier 2025–2026 research |\n\nEvery file begins with structured frontmatter (title, source, URL, authors, date, topics, **controlled `tags`**, `aliases`) so the whole corpus is trivially filterable and parseable — and navigable by topic (see [`atlas\u002F`](atlas\u002F) and [`atlas\u002FTAGS.md`](atlas\u002FTAGS.md)).\n\n---\n\n## What's inside\n\n### Research papers (`corpus\u002Fpapers\u002F` — 391)\n\nThe 78 canonical papers in **full text**, plus **313 recent (2024H2–2026)** papers added as the **verbatim abstract + metadata** (with a link to the full paper). Foundational to frontier:\n\n- **Foundations** — Dropout, word2vec, Seq2Seq, Adam, Batch\u002FLayer Norm, VAE, GANs\n- **Vision** — VGG, GoogLeNet, ResNet, DenseNet, Faster R-CNN, YOLOv3, ViT, MAE, DETR\n- **The Transformer era** — *Attention Is All You Need*, BERT, RoBERTa, T5, GPT-3, Chinchilla, scaling laws\n- **Efficient attention & serving** — FlashAttention 1\u002F2\u002F3, Linformer, Longformer, Performer, Reformer, PagedAttention\u002FvLLM, MQA\u002FGQA, RoPE\u002FALiBi\n- **Generative models** — DDPM, DDIM, Latent Diffusion (Stable Diffusion), DALL·E 2, DiT, VQ-VAE, StyleGAN, CLIP\n- **LLMs & alignment** — LLaMA 1\u002F2, Mistral, Mixtral, InstructGPT\u002FRLHF, DPO, LoRA\u002FQLoRA\u002FDoRA, GPTQ\u002FAWQ\u002FLLM.int8()\n- **Reasoning & agents** — Chain-of-Thought, Self-Consistency, Tree of Thoughts, ReAct, Toolformer\n- **Frontier (2024–2025)** — Mamba, State-Space Duality, Mixture-of-Depths, DeepSeek V2\u002FV3\u002FR1, Native Sparse Attention\n- **Newly added (2025–2026)** — Titans, RWKV-7, Gated DeltaNet, Mamba-3 & hybrid linear attention; reasoning & test-time compute (RLVR, GRPO-line); BitNet \u002F FP4 quantization; diffusion language models; video\u002Fimage generation; VLMs; LLM agents & RAG; SAE \u002F attribution-graph interpretability; world models; vision-language-action robotics; frontier model reports; AI-for-science\n\n### Lecture transcripts (`corpus\u002Fyoutube\u002F` — 474)\n\nFull transcripts from the most respected ML courses and educators:\n\n| Course \u002F Channel | Lectures |\n|---|---:|\n| MIT 6.S191 — Introduction to Deep Learning | 86 |\n| Yannic Kilcher — paper walkthroughs | 99 |\n| DeepLearning.AI | 49 |\n| fast.ai — Practical Deep Learning (Jeremy Howard) | 48 |\n| Stanford CS224n — NLP with Deep Learning | 46 |\n| Stanford CS25 — Transformers United | 39 |\n| Stanford CS229 — Machine Learning (Andrew Ng) | 20 |\n| Stanford CS336 — Language Modeling from Scratch | 15 |\n| Stanford CS236 — Deep Generative Models | 15 |\n| Stanford CS231n — CNNs for Visual Recognition | 14 |\n| Stanford CS230 — Deep Learning (Andrew Ng) | 9 |\n| Andrej Karpathy — channel + Neural Networks: Zero to Hero | 25 |\n| 3Blue1Brown — Neural Networks series | 9 |\n\n### Web articles (`corpus\u002Fweb\u002F` — 58)\n\nThe explainers practitioners actually link to: Jay Alammar's *Illustrated* series, Lilian Weng's deep-dives, Sebastian Raschka, the Stanford CS231n notes, *Dive into Deep Learning*, Distill.pub, Anthropic's Transformer Circuits, and Karpathy's blog.\n\n---\n\n## Repository structure\n\n```\nmachine-learning-library\u002F\n├── README.md                  ← you are here\n├── SOURCES.md                 ← full attribution: every source, credited\n├── NOTICE.md                  ← licensing & usage notes\n├── AGENTS.md \u002F CLAUDE.md      ← how an AI agent should navigate & cite this corpus\n├── corpus\u002F\n│   ├── INDEX.md               ← machine-generated index of all 923 files\n│   ├── papers\u002F                ← 391 arXiv papers (78 full-text + 313 abstract+metadata)\n│   ├── youtube\u002F               ← 474 lecture transcripts, grouped by channel\n│   └── web\u002F                   ← 58 articles, grouped by domain\n├── atlas\u002F                     ← topic navigation layer (Maps of Content + learning paths)\n│   ├── Home.md                ← start here when browsing in Obsidian\n│   ├── TAGS.md                ← the controlled tag vocabulary\n│   ├── topics\u002F                ← one hub per topic (auto-lists every matching doc)\n│   └── paths\u002F                 ← curated reading paths (Zero to Transformer, …)\n├── .obsidian\u002F                 ← bundled vault config — open the folder in Obsidian and it just works\n├── tools\u002F                     ← scripts that clean, tag, and index the corpus\n└── examples\u002F\n    ├── self-attention-study-note.md   ← a synthesized, fully-cited study note\n    └── rag_quickstart.py              ← minimal semantic search \u002F RAG over the corpus\n```\n\n### Frontmatter format\n\nEvery document looks like this:\n\n```markdown\n---\ntitle: \"Attention Is All You Need\"\nsource: \"arxiv\"\narxiv_id: \"1706.03762\"\nurl: \"http:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762v7\"\nauthors: [\"Ashish Vaswani\", \"Noam Shazeer\", ...]\npublished: \"2017-06-12\"\ntopics: [\"transformer\", \"attention\"]          # original free-form tags\naliases: [\"Attention Is All You Need\"]         # readable Obsidian wikilink targets\ntags: [topic\u002Ftransformers-attention, level\u002Fadvanced, medium\u002Fpaper, task\u002Flanguage, technique\u002Fattention]\n---\n\n## Abstract\n...\n## Full Text\n...\n```\n\n`tags:` is a controlled, queryable vocabulary (topic \u002F level \u002F medium \u002F task \u002F\ntechnique) layered on top of the original `topics:` — see [`atlas\u002FTAGS.md`](atlas\u002FTAGS.md).\n\n---\n\n## Use cases\n\nThis corpus is a building block. Some of the things it's good for:\n\n1. **Retrieval-augmented ML tutor.** Embed the corpus, drop it in a vector DB, and build a Q&A assistant that answers ML questions grounded in real sources — and can cite the exact lecture or paper it drew from. No hallucinated references.\n\n2. **Fine-tuning a domain model.** ~11M tokens of clean, on-topic ML text is a realistic dataset for continued-pretraining or instruction-tuning a small (1–7B) \"ML explainer\" model.\n\n3. **Embeddings \u002F retrieval benchmark.** A coherent, single-domain corpus is ideal for evaluating embedding models and retrieval pipelines on technical content.\n\n4. **Synthesized study notes.** Use an LLM to compress multiple sources on one topic into a single cited note. `examples\u002Fself-attention-study-note.md` shows the output: a self-attention explainer assembled from the original paper, two blog posts, and a Karpathy lecture — every claim traced back to its source.\n\n5. **Concept \u002F citation graphs.** The frontmatter + cross-references make it straightforward to extract a graph of which papers and lectures explain which concepts.\n\n6. **Personalized reading paths.** Filter by topic and source to generate an ordered learning path (e.g. \"everything on diffusion models, easiest first\").\n\n7. **Offline reference library.** It's just Markdown — grep it, open it in Obsidian, read it on a plane.\n\n---\n\n## Quick start\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FATOM00blue\u002Fmachine-learning-library.git\ncd machine-learning-library\n\n# Browse the full index\nless corpus\u002FINDEX.md\n\n# Everything is plain Markdown — search it however you like\ngrep -rl \"flash attention\" corpus\u002F\n\n# Filter by frontmatter, e.g. all 2024+ papers (with yq)\nfor f in corpus\u002Fpapers\u002F*.md; do\n  yq -f extract '.published' \"$f\" | grep -q '^202[45]' && echo \"$f\"\ndone\n```\n\nMinimal RAG sketch (Python):\n\n```python\nimport glob, frontmatter\nfrom sentence_transformers import SentenceTransformer\n\ndocs = [frontmatter.load(p) for p in glob.glob(\"corpus\u002F**\u002F*.md\", recursive=True)]\nmodel = SentenceTransformer(\"BAAI\u002Fbge-small-en-v1.5\")\nembeddings = model.encode([d.content for d in docs])\n# ... store in your vector DB of choice and query\n```\n\nA ready-to-run version of this lives in [`examples\u002Frag_quickstart.py`](examples\u002Frag_quickstart.py)\n(`python examples\u002Frag_quickstart.py --build`, then ask it questions).\n\n---\n\n## Open in Obsidian \u002F connect your agent\n\nThis repo is also a ready-to-use **Obsidian vault** and an **agent-friendly\nknowledge base** — pick whichever fits how you work:\n\n**📓 Browse it in [Obsidian](https:\u002F\u002Fobsidian.md).** Open the cloned folder as a\nvault — a bundled `.obsidian\u002F` config sets up a topic-colored graph and sensible\ndefaults out of the box.\n- **`atlas\u002FHome.md`** is your start page; **`atlas\u002Ftopics\u002F`** has a hub per topic\n  with a curated reading list and cross-links — all working with no plugins.\n- Two **optional** community plugins make it shine; Obsidian will offer to enable\n  them, or install from *Settings → Community plugins*:\n  [Dataview](https:\u002F\u002Fgithub.com\u002Fblacksmithgu\u002Fobsidian-dataview) (live auto-listed\n  doc tables in each hub) and\n  [Front Matter Title](https:\u002F\u002Fgithub.com\u002Fsnezhig\u002Fobsidian-front-matter-title)\n  (graph\u002Fexplorer nodes show titles instead of arXiv\u002Fvideo IDs).\n- Everything degrades gracefully — the hubs, links, tags, and graph also render\n  fine on GitHub and as plain Markdown with no plugins at all.\n\n**🤖 Point your AI agent at it.** Choose one:\n| You use… | Do this |\n|---|---|\n| Cursor \u002F Codex \u002F Copilot \u002F Gemini CLI \u002F Aider \u002F Zed | Open the folder — they read [`AGENTS.md`](AGENTS.md) automatically. |\n| Claude Code | Open the folder — it reads [`CLAUDE.md`](CLAUDE.md); a `\u002Fml-library` skill is bundled. |\n| Claude Desktop | Add a [Filesystem MCP server](https:\u002F\u002Fmodelcontextprotocol.io) pointed at this folder. |\n| Obsidian + live read\u002Fwrite | Install the **Local REST API** plugin (built-in MCP server) and `claude mcp add` it. |\n| Semantic search \u002F RAG | Run [`examples\u002Frag_quickstart.py`](examples\u002Frag_quickstart.py). |\n\nYour agent then answers ML questions grounded in these sources and **cites the\nexact paper or lecture** — no hallucinated references.\n\n---\n\n## Attribution\n\n**All credit belongs to the original creators.** This repository is a *curation and reformatting* of publicly available educational material — it contains no original research or teaching content of its own. Every document retains its source URL and authorship in the frontmatter, and **[SOURCES.md](SOURCES.md)** lists every course, channel, publication, and paper included.\n\nIf you find this useful, please support the people who actually made the material: subscribe to the channels, take the courses, read the papers, and cite the authors.\n\nSee **[NOTICE.md](NOTICE.md)** for licensing details and the takedown\u002Fremoval policy.\n\n---\n\n## License\n\n- The **structure, index, scripts, and organization** of this repository are released under the MIT License.\n- The **content of each document** remains under the rights of its original author\u002Fpublisher and is included here for research and educational purposes. See [NOTICE.md](NOTICE.md).\n","该项目是一个精心整理的机器学习教育资源库，包含923份文档（391篇arXiv论文、474个斯坦福\u002F麻省理工\u002FKarpathy\u002Ffast.ai讲座、58篇解释性文章），统一转换为Markdown格式，并附有完整的来源信息。其核心功能是提供一个高质量、结构化的机器学习资料集，便于人类阅读和机器处理。技术特点包括：所有文档均采用一致的Markdown格式并配有YAML前言，支持在Obsidian中直接使用或作为代理的数据源。适用于希望系统性学习机器学习从入门到前沿知识的研究者、学生及开发者，同时也适合用于构建基于检索增强的学习工具、领域模型微调或嵌入基准测试等场景。",2,"2026-06-11 04:08:05","CREATED_QUERY"]