[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1688":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":35,"discoverSource":36},1688,"databricks-code-practice","jrlasak\u002Fdatabricks-code-practice","jrlasak","Practice Databricks coding skills with hands-on exercises. Import into Databricks Free Edition, write code, run assertions, check pass\u002Ffail. Covers Delta Lake, Spark SQL, PySpark, Auto Loader, medallion architecture, window functions, and more. ","https:\u002F\u002Fdataengineer.wiki",null,"Python",207,115,4,0,2,6,31,6.19,false,"main",[23,24,25,26,27,28,29,30,31],"auto-loader","coding-practice","data-engineering","databricks","databricks-certification","delta-lake","medallion-architecture","pyspark","spark-sql","2026-06-12 02:00:31","# Databricks Code Practice\n\n### Get fluent in Databricks by typing, not watching.\n\n**104 exercises + 5 production-grade pipeline labs. All on Databricks Free Edition.**\n\nClone once, import into Databricks, pick a folder. Exercises fail loud until your code is right; labs ship with synthetic data so you build production-style pipelines, not toy ones.\n\n> **New (18 April 2026):** 5 full-scale pipeline labs + 1 benchmark deep-dive just landed. If you starred this repo for the exercises, they're still here - now alongside end-to-end project work.\n\n---\n\n## Author\n\n**Jakub Lasak** - Databricks Data Engineer. Helping you interview like seniors, execute like seniors, and think like seniors.\n\n- [LinkedIn](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fjrlasak\u002F) (13.5K followers) - Databricks projects and tips\n- [Substack](https:\u002F\u002Fdataengineer.wiki\u002Fsubstack) - Newsletter for data engineers\n- [DataEngineer.wiki](https:\u002F\u002Fdataengineer.wiki) - Cheat sheets, learning paths, cert guides\n\n> **Prepping for interviews?** Writing code is one half of the battle - knowing the questions that actually come up is the other. I maintain [Databricks Interview Cheat Sheets](https:\u002F\u002Fdataengineer.wiki\u002Fproducts) by seniority level (junior \u002F mid \u002F senior \u002F bundle).\n\n## What's Inside\n\nFluency comes from reps, not reading. Three structured paths:\n\n- **`exercises\u002F`** - focused reps on a single concept. LeetCode-style, 5-30 min each.\n- **`pipeline-labs\u002F`** - end-to-end medallion pipelines on a business scenario. 2-3 hours each.\n- **`deep-dives\u002F`** - measure the impact of a technique with numbers. 1-2 hours each.\n\n|               | Exercises                                  | Pipeline Labs                                                    | Deep-Dives                                              |\n| ------------- | ------------------------------------------ | ---------------------------------------------------------------- | ------------------------------------------------------- |\n| **Format**    | Single notebook, one TODO per exercise     | Multi-notebook guided project                                    | Single-topic deep investigation                         |\n| **Time**      | 5-30 min per exercise                      | 2-3 hours per lab                                                | 1-2 hours                                               |\n| **Scope**     | One concept (MERGE, window functions, ...) | End-to-end project (ingestion -> bronze -> silver -> gold)       | One topic measured in depth                             |\n| **Narrative** | None. \"Given table X, write...\"            | Business scenario. \"You're building a streaming pipeline for...\" | Benchmark-driven. \"Apply technique, measure the delta.\" |\n| **Order**     | Pick any, skip around                      | Sequential notebooks that build on each other                    | Sequential; each step layers on the last                |\n| **Goal**      | Drill a skill until it's automatic         | See how concepts fit in a real project                           | Prove what a technique actually buys you                |\n\n## Catalog\n\n### Exercises (`exercises\u002F`)\n\n\u003C!-- TOPICS-START -->\n\n| Topic                               | Notebooks | Exercises | Description                                                                                                                          |\n| ----------------------------------- | --------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------ |\n| [Delta Lake](exercises\u002Fdelta-lake\u002F) | 6         | 51        | MERGE operations, time travel, schema enforcement, OPTIMIZE, liquid clustering, change data feed                                     |\n| [ELT](exercises\u002Felt\u002F)               | 7         | 53        | Spark SQL joins, window functions, PySpark transformations, Auto Loader, batch ingestion, medallion architecture, complex data types |\n\n**Total: 13 notebooks, 104 exercises**\n\n\u003C!-- TOPICS-END -->\n\nMore exercise topics coming - next up: Streaming, Unity Catalog, Performance, and DLT.\n\n### Pipeline Labs (`pipeline-labs\u002F`)\n\nMulti-notebook, end-to-end medallion pipelines with a business scenario. Each runs 2-3 hours and ships with a synthetic data generator.\n\n| Lab                                                                      | What You Build                                                                                       | Focus                                                                                         |\n| ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |\n| [Apparel Retail 360 (DLT)](pipeline-labs\u002Fapparel-streaming\u002F)             | End-to-end retail analytics pipeline on Delta Live Tables with a full medallion architecture.        | DLT, Medallion, SCD Type 2, Streaming, Data Quality Expectations                              |\n| [Fintech Transaction Monitoring](pipeline-labs\u002Ffintech-monitoring\u002F)      | Real-time fraud-monitoring pipeline for a payment processor handling 500K+ transactions\u002Fday.         | Structured Streaming, Rescued Data, Watermarked Dedup, Stream-Static Joins, Liquid Clustering |\n| [DE Associate Certification Prep](pipeline-labs\u002Fde-associate-cert-prep\u002F) | Production-grade pipeline covering every exam domain of the Databricks Data Engineer Associate cert. | Auto Loader, COPY INTO, Medallion, SCD2, Jobs, Unity Catalog                                  |\n| [PySpark Developer Cert Prep](pipeline-labs\u002Fpyspark-cert-zenith\u002F)        | E-commerce analytics pipeline covering every domain of the Spark Developer Associate cert.           | DataFrame API, Structured Streaming, Data Skew, Performance Tuning                            |\n\n### Deep-Dives (`deep-dives\u002F`)\n\nSingle-topic labs that measure the impact of a technique with numbers, not intuition.\n\n| Lab                                                                    | What You Build                                                                              | Focus                                                                     |\n| ---------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |\n| [6 Delta Optimization Techniques](deep-dives\u002Foptimization-techniques\u002F) | Iteratively apply and measure core Delta performance levers on a synthetic 50M-row dataset. | Partitioning, Z-Order, OPTIMIZE, Auto Optimize, Liquid Clustering, VACUUM |\n\n## How to Use\n\n1. Sign up for [Databricks Free Edition](https:\u002F\u002Fwww.databricks.com\u002Flearn\u002Ffree-edition) (free, no credit card)\n2. Clone or import this repo into Databricks (Workspace -> Create -> Git folder)\n3. Navigate to the folder you want, open its README, follow the instructions\n\nEverything runs on Free Edition: serverless compute, Unity Catalog, Delta Lake. No cloud account, no cluster config.\n\n## Which Should I Start With?\n\n- **New to Databricks?** Start with [DE Associate Cert Prep](pipeline-labs\u002Fde-associate-cert-prep\u002F) - broadest fundamentals.\n- **Want quick reps on a specific concept?** [Delta Lake exercises](exercises\u002Fdelta-lake\u002F) or [ELT exercises](exercises\u002Felt\u002F) - drill one concept at a time.\n- **Comfortable with batch, new to streaming?** [Apparel DLT](pipeline-labs\u002Fapparel-streaming\u002F), then [Fintech Monitoring](pipeline-labs\u002Ffintech-monitoring\u002F).\n- **Preparing for a cert?** [DE Associate](pipeline-labs\u002Fde-associate-cert-prep\u002F) or [Spark Developer Associate](pipeline-labs\u002Fpyspark-cert-zenith\u002F).\n- **Already shipping pipelines, want to go deeper on performance?** [Delta Optimization Techniques](deep-dives\u002Foptimization-techniques\u002F).\n\n## Stay in the Loop\n\nNew exercises and labs ship regularly. Follow on [LinkedIn](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fjrlasak\u002F) or subscribe to the [Substack newsletter](https:\u002F\u002Fdataengineer.wiki\u002Fsubstack) to be notified when new content drops.\n\n## Feedback\n\nFound a bug? Have a suggestion? [Open an issue](..\u002F..\u002Fissues).\n\n---\n\n> **Disclaimer**: This is an independent educational resource created by Jakub Lasak. Not affiliated with, endorsed by, or sponsored by Databricks, Inc. \"Databricks\" and \"Delta Lake\" are trademarks of their respective owners.\n","该项目旨在通过实践练习提升Databricks编码技能，涵盖Delta Lake、Spark SQL、PySpark、Auto Loader及勋章架构等内容。核心功能包括104个专项练习和5个生产级管道实验室，所有内容均可在Databricks免费版上运行。每个练习都配有断言以验证代码正确性，而实验室则提供合成数据用于构建接近实际生产的管道。此项目特别适合希望提高Databricks操作熟练度的数据工程师，无论是准备面试还是日常工作中遇到的具体问题解决，都能从中受益。","2026-06-11 02:45:26","CREATED_QUERY"]