[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-70737":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},70737,"Cookbook","andkret\u002FCookbook","andkret","The Data Engineering Cookbook","https:\u002F\u002Flearndataengineering.com\u002F",null,"Python",15137,2715,550,116,0,6,24,60,18,100,"Apache License 2.0",false,"master",true,[27,28,29,30,31],"best-practices","big-data","cookbook","data-engineer","data-engineering","2026-06-12 04:00:57","\u003C!--- # The Data Engineering Cookbook -->\n\n\u003Cdiv align=\"center\">\n\t\u003Cimg width=\"341\" height=\"426\" src=\"images\u002FCookbookCover.jpg\" alt=\"Data Engineering Cookbook\">\n\t\u003Cbr>\n\t\u003Cbr>\n\t\u003Cbr>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n\t\u003Ca href=\"sections\u002F01-Introduction.md\">What is this Book?\u003C\u002Fa>&nbsp;&nbsp;&nbsp;\n  \u003Ca href=\"#how-to-contribute\">How to Contribute\u003C\u002Fa>&nbsp;&nbsp;&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUCY8mzqqGwl5_bTpBY9qLMAA\">YouTube\u003C\u002Fa>&nbsp;&nbsp;&nbsp;\n\t\u003Ca\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fandreaskayy\">Twitter\u003C\u002Fa>&nbsp;&nbsp;&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fwww.amazon.com\u002Fshop\u002Fplumbersofdatascience\">Amazon Shop\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cbr>\n\n## If You Like This Book & Need More Help\nCheck out my Data Engineering Academy at LearnDataEngineering.com trusted by almost 2,000 students!\n\n**Visit learndataengineering.com:** [Click Here](https:\u002F\u002Flearndataengineering.com)\n\n- Learn Data Engineering with our online Academy\n- Perfect for becoming a Data Engineer or add Data Engineering to your skillset\n- Proven process based on years of experience and hundreds of hours of personal coaching\n- Over 30 prepared courses on the most important techniques, fundamental tools and platforms plus our\n- Associate Data Engineer Certification\n- Academy Discord server with over 1,000 members\n\n\n\n## Support This Book For Free!\n- **Amazon:** [Click Here](https:\u002F\u002Fwww.amazon.com\u002Fshop\u002Fplumbersofdatascience) buy whatever you like from Amazon using this link* (Also check out my complete podcast gear and books)\n\n\u003C!---\nI get asked super often how to become a Data Engineer.\nThat's why I decided to start this cookbook with all the topics you need to look into.\n\nIt's not only useful for beginners, professionals will definitely like the case study section.\n\nIf you look for the old PDF version it's [here](https:\u002F\u002Fgithub.com\u002Fandkret\u002FCookbook\u002Fraw\u002FLaTex-Version-Deprecated\u002FData%20Engineering%20Cookbook.pdf)\n\n-->\n\n## Here's what's new:\nFind the change log with all recent updates here: [SEE UPDATES](sections\u002F10-Updates.md)\n\n# Contents:\n- [Introduction](sections\u002F01-Introduction.md)\n- [Basic Engineering Skills](sections\u002F02-BasicSkills.md)\n- [Advanced Engineering Skills](sections\u002F03-AdvancedSkills.md)\n- [Free Hands On Courses \u002F Tutorials](sections\u002F04-HandsOnCourse.md)‚\n- [Case Studies](sections\u002F05-CaseStudies.md)\n- [Best Practices Cloud Platforms](sections\u002F06-BestPracticesCloud.md)\n- [130+ Data Sources Data Science](sections\u002F07-DataSources.md)\n- [1001 Interview Questions](sections\u002F08-InterviewQuestions.md)\n- [Recommended Books, Courses, and Podcasts](sections\u002F09-BooksAndCourses.md)\n- [Updates](sections\u002F10-Updates.md)\n\u003C!--  test -->\n\n- [How To Contribute](#how-to-contribute)\n- [Support What You Like](#support)\n- [Important Links](#important-links)\n\n# Full Table Of Contents:\n##  Introduction\n- [What is this Cookbook](sections\u002F01-Introduction.md#what-is-this-cookbook)\n- [Data Engineers](sections\u002F01-Introduction.md#data-engineers)\n- [My Data Science Platform Blueprint](sections\u002F01-Introduction.md#my-data-science-platform-blueprint)\n  - [Connect](sections\u002F01-Introduction.md#connect)\n  - [Buffer](sections\u002F01-Introduction.md#buffer)\n  - [Processing Framework](sections\u002F01-Introduction.md#processing-framework)\n  - [Store](sections\u002F01-Introduction.md#store)\n  - [Visualize](sections\u002F01-Introduction.md#visualize)\n- [Who Companies Need](sections\u002F01-Introduction.md#who-companies-need)\n- [How to Learn Data Engineering](sections\u002F01-Introduction.md#how-to-learn-data-engineering)\n\t- [Andreas on the Super Data Science Podcast](sections\u002F01-Introduction.md#Interview-with-Andreas-on-the-Super-Data-Science-Podcast)\n\t- [Building Blocks to Learn Data Engineering](sections\u002F01-Introduction.md#building-blocks-to-learn-data-engineering)\n  - [Roadmap for Beginners](sections\u002F01-Introduction.md#roadmap-for-beginners)\n\t- [Roadmap for Data  Analysts](sections\u002F01-Introduction.md#roadmap-for-data-analysts)\n\t- [Roadmap for Data Scientists](sections\u002F01-Introduction.md#roadmap-for-data-scientists)\n\t- [Roadmap for Software Engineers](sections\u002F01-Introduction.md#roadmap-for-software-engineers)\n- [Data Engineers Skills Matrix](sections\u002F01-Introduction.md#data-engineers-skills-matrix)\n- [How to Become a Senior Data Engineer](sections\u002F01-Introduction.md#how-to-become-a-senior-data-engineer)\n\n## Basic Engineering Skills\n- [Learn To Code](sections\u002F02-BasicSkills.md#learn-to-code)\n- [Get Familiar With Git](sections\u002F02-BasicSkills.md#get-familiar-with-git)\n- [Agile Development](sections\u002F02-BasicSkills.md#agile-development)\n  - [Why is agile so important?](sections\u002F02-BasicSkills.md#Why-is-agile-so-important)\n  - [Agile rules I learned over the years](sections\u002F02-BasicSkills.md#agile-rules-i-learned-over-the-years)\n  - [Agile Frameworks](sections\u002F02-BasicSkills.md#agile-frameworks)\n    - [Scrum](sections\u002F02-BasicSkills.md#scrum)\n    - [OKR](sections\u002F02-BasicSkills.md#okr)\n- [Software Engineering Culture](sections\u002F02-BasicSkills.md#software-engineering-culture)\n- [Learn how a Computer Works](sections\u002F02-BasicSkills.md#learn-how-a-computer-works)\n- [Data Network Transmission](sections\u002F02-BasicSkills.md#data-network-transmission)\n- [Security and Privacy](sections\u002F02-BasicSkills.md#security-and-privacy)\n  - [SSL Public and Private Key Certificates](sections\u002F02-BasicSkills.md#ssl-public-and-private-key-Certificates)\n  - [JSON Web Tokens](sections\u002F02-BasicSkills.md#json-web-tokens)\n  - [GDPR regulations](sections\u002F02-BasicSkills.md#gdpr-regulations)\n- [Linux](sections\u002F02-BasicSkills.md#linux)\n  - [OS Basics](sections\u002F02-BasicSkills.md#os-basics)\n  - [Shell scripting](sections\u002F02-BasicSkills.md#shell-scripting)\n  - [Cron Jobs](sections\u002F02-BasicSkills.md#cron-jobs)\n  - [Packet Management](sections\u002F02-BasicSkills.md#packet-management)\n- [Docker](sections\u002F02-BasicSkills.md#docker)\n  - [What is Docker and How it Works](sections\u002F02-BasicSkills.md#what-is-docker-and-what-do-you-use-it-for)\n    -  [Don't Mess Up Your System](sections\u002F02-BasicSkills.md#dont-mess-up-your-system)\n    - [Preconfigured Images](sections\u002F02-BasicSkills.md#preconfigured-images)\n    - [Take it With You](sections\u002F02-BasicSkills.md#take-it-with-you)\n    - [Kubernetes Container Deployment](sections\u002F02-BasicSkills.md#kubernetes-container-deployment)\n    - [How to Create Start and Stop a Container](sections\u002F02-BasicSkills.md#how-to-create-start-stop-a-container)\n    - [Docker Micro Services](sections\u002F02-BasicSkills.md#docker-micro-services)\n    - [Kubernetes](sections\u002F02-BasicSkills.md#kubernetes)\n    - [Why and How To Do Docker Container Orchestration](sections\u002F02-BasicSkills.md#why-and-how-to-do-docker-container-orchestration)\n    - [Userful Docker Commands](sections\u002F02-BasicSkills.md#useful-docker-commands)\n- [The Cloud](sections\u002F02-BasicSkills.md#the-cloud)\n  - [IaaS vs PaaS vs SaaS](sections\u002F02-BasicSkills.md#iaas-vs-paas-vs-saas)\n  - [AWS Azure IBM Google IBM](sections\u002F02-BasicSkills.md#aws-azure-ibm-google)\n  - [Cloud vs On-Premises](sections\u002F02-BasicSkills.md#cloud-vs-on-premises)\n  - [Security](sections\u002F02-BasicSkills.md#security)\n  - [Hybrid Clouds](sections\u002F02-BasicSkills.md#hybrid-clouds)\n- [Security Zone Design](sections\u002F02-BasicSkills.md#security-zone-design)\n  - [How to secure a multi layered application](sections\u002F02-BasicSkills.md#how-to-secure-a-multi-layered-application)\n  - [Cluster security with Kerberos](sections\u002F02-BasicSkills.md#cluster-security-with-kerberos)\n\n## Advanced Engineering Skills\n- [Data Science Platform](sections\u002F03-AdvancedSkills.md#data-science-platform)\n  - [Why a Good Data Platform Is Important](sections\u002F03-AdvancedSkills.md#why-a-good-data-platform-is-important)\n  - [Big Data vs Data Science and Analytics](sections\u002F03-AdvancedSkills.md#Big-Data-vs-Data-Science-and-Analytics)\n  - [The 4 Vs of Big Data](sections\u002F03-AdvancedSkills.md#the-4-vs-of-big-data)\n  - [Why Big Data](sections\u002F03-AdvancedSkills.md#why-big-data)\n    - [Planning is Everything](sections\u002F03-AdvancedSkills.md#planning-is-everything)\n    - [The Problem with ETL](sections\u002F03-AdvancedSkills.md#the-problem-with-etl)\n    - [Scaling Up](sections\u002F03-AdvancedSkills.md#scaling-up)\n    - [Scaling Out](sections\u002F03-AdvancedSkills.md#scaling-out)\n    - [When not to Do Big Data](sections\u002F03-AdvancedSkills.md#please-dont-go-big-data)\n- [81 Platform & Pipeline Design Questions](sections\u002F03-AdvancedSkills.md#81-platform-and-pipeline-design-questions)\n  - [Data Source Questions](sections\u002F03-AdvancedSkills.md#data-source-questions)\n  - [Goals and Destination Questions](sections\u002F03-AdvancedSkills.md#goals-and-destination-questions)\n- [Connect](sections\u002F03-AdvancedSkills.md#connect)\n  - [REST APIs](sections\u002F03-AdvancedSkills.md#rest-apis)\n    - [API Design](sections\u002F03-AdvancedSkills.md#api-design)\n    - [Implemenation Frameworks](sections\u002F03-AdvancedSkills.md#implementation-frameworks)\n    - [Security](sections\u002F03-AdvancedSkills.md#security)\n  - [Apache Nifi](sections\u002F03-AdvancedSkills.md#apache-nifi)\n  - [Logstash](sections\u002F03-AdvancedSkills.md#logstash)\n- [Buffer](sections\u002F03-AdvancedSkills.md#buffer)\n  - [Apache Kafka](sections\u002F03-AdvancedSkills.md#apache-kafka)\n    - [Why a Message Queue Tool?](sections\u002F03-AdvancedSkills.md#why-a-message-queue-tool)\n    - [Kafka Architecture](sections\u002F03-AdvancedSkills.md#kafka-architecture)\n    - [Kafka Topics](sections\u002F03-AdvancedSkills.md#what-are-topics)\n    - [Kafka and Zookeeper](sections\u002F03-AdvancedSkills.md#what-does-zookeeper-have-to-do-with-kafka)\n    - [How to Produce and Consume Messages](sections\u002F03-AdvancedSkills.md#how-to-produce-and-consume-messages)\n    - [Kafka Commands](sections\u002F03-AdvancedSkills.md#kafka-commands)\n  - [Apache Redis Pub-Sub](sections\u002F03-AdvancedSkills.md#redis-pub-sub)\n  - [AWS Kinesis](sections\u002F03-AdvancedSkills.md#apache-kafka)\n  - [Google Cloud PubSub](sections\u002F03-AdvancedSkills.md#google-cloud-pubsub)\n- [Processing Frameworks](sections\u002F03-AdvancedSkills.md#processing-frameworks)\n\t- [Lambda and Kappa Architecture](sections\u002F03-AdvancedSkills.md#lambda-and-kappa-architecture)\n\t- [Batch Processing](sections\u002F03-AdvancedSkills.md#batch-processing)\n\t- [Stream Processing](sections\u002F03-AdvancedSkills.md#stream-processing)\n\t\t- [Three Methods of Streaming](sections\u002F03-AdvancedSkills.md#three-methods-of-streaming)\n\t\t- [At Least Once](sections\u002F03-AdvancedSkills.md#at-least-once)\n\t\t- [At Most Once](sections\u002F03-AdvancedSkills.md#at-most-once)\n\t\t- [Exactly Once](sections\u002F03-AdvancedSkills.md#exactly-once)\n\t\t- [Check The Tools](sections\u002F03-AdvancedSkills.md#check-the-tools)\n\t- [Should You do Stream or Batch Processing](sections\u002F03-AdvancedSkills.md#should-you-do-stream-or-batch-processing)\n\t- [Is ETL still relevant for Analytics?](sections\u002F03-AdvancedSkills.md#is-etl-still-relevant-for-analytics)\n  - [MapReduce](sections\u002F03-AdvancedSkills.md#mapreduce)\n    - [How Does MapReduce Work](sections\u002F03-AdvancedSkills.md#How-does-mapreduce-work)\n    - [MapReduce](sections\u002F03-AdvancedSkills.md#mapreduce)\n    - [MapReduce Example](sections\u002F03-AdvancedSkills.md#example)\n    - [MapReduce Limitations](sections\u002F03-AdvancedSkills.md#What-is-the-limitation-of-mapreduce)\n  - [Apache Spark](sections\u002F03-AdvancedSkills.md#apache-spark)\n    - [What is the Difference to MapReduce?](sections\u002F03-AdvancedSkills.md#what-is-the-difference-to-MapReduce)\n    - [How Spark Fits to Hadoop](sections\u002F03-AdvancedSkills.md#how-does-spark-fit-to-hadoop)\n    - [Spark vs Hadoop](sections\u002F03-AdvancedSkills.md#wheres-the-difference)\n    - [Spark and Hadoop a Perfect Fit](sections\u002F03-AdvancedSkills.md#spark-and-hadoop-is-a-perfect-fit)\n    - [Spark on YARn](sections\u002F03-AdvancedSkills.md#spark-on-yarn)\n    - [My Simple Rule of Thumb](sections\u002F03-AdvancedSkills.md#my-simple-rule-of-thumb)\n    - [Available Languages](sections\u002F03-AdvancedSkills.md#available-languages)\n    - [Spark Driver Executor and SparkContext](sections\u002F03-AdvancedSkills.md#how-spark-works-driver-executor-sparkcontext)\n    - [Spark Batch vs Stream processing](sections\u002F03-AdvancedSkills.md#spark-batch-vs-stream-processing)\n    - [How Spark uses Data From Hadoop](sections\u002F03-AdvancedSkills.md#How-does-spark-use-data-from-hadoop)\n    - [What are RDDs and How to Use Them](sections\u002F03-AdvancedSkills.md#what-are-rdds-and-how-to-use-them)\n    - [SparkSQL How and Why to Use It](sections\u002F03-AdvancedSkills.md#available-languages)\n    - [What are Dataframes and How to Use Them](sections\u002F03-AdvancedSkills.md#what-are-dataframes-how-to-use-them)\n    - [Machine Learning on Spark (TensorFlow)](sections\u002F03-AdvancedSkills.md#machine-learning-on-spark-tensor-flow)\n    - [MLlib](sections\u002F03-AdvancedSkills.md#mllib)\n    - [Spark Setup](sections\u002F03-AdvancedSkills.md#spark-setup)\n    - [Spark Resource Management](sections\u002F03-AdvancedSkills.md#spark-resource-management)\n  - [AWS Lambda](sections\u002F03-AdvancedSkills.md#apache-flink)  \n  - [Apache Flink](sections\u002F03-AdvancedSkills.md#apache-flink)\n  - [Elasticsearch](sections\u002F03-AdvancedSkills.md#elasticsearch)\n  - [Apache Drill](sections\u002F03-AdvancedSkills.md#apache-drill)\n  - [StreamSets](sections\u002F03-AdvancedSkills.md#streamsets)\n- [Store](sections\u002F03-AdvancedSkills.md#store)\n\t- [Analytical Data Stores](03-AdvancedSkills.md#analytical-data-stores)\n\t\t- [Data Warehouse vs Data Lake](sections\u002F03-AdvancedSkills.md#data-warehouse-vs-data-lake)\n\t\t- [Snowflake and dbt](sections\u002F03-AdvancedSkills.md#snowflake-and-dbt)\n\t- [Transactional Data Stores](sections\u002F03-AdvancedSkills.md#transactional-data-stores)\n\t\t- [SQL Databases](sections\u002F03-AdvancedSkills.md#sql-databases)\n\t    - [PostgreSQL DB](sections\u002F03-AdvancedSkills.md#postgresql-db)\n\t    - [Database Design](sections\u002F03-AdvancedSkills.md#database-design)\n\t    - [SQL Queries](sections\u002F03-AdvancedSkills.md#sql-queries)\n\t    - [Stored Procedures](sections\u002F03-AdvancedSkills.md#stored-procedures)\n\t    - [ODBC\u002FJDBC Server Connections](sections\u002F03-AdvancedSkills.md#odbc-jdbc-server-connections)\n\t  - [NoSQL Stores](sections\u002F03-AdvancedSkills.md#nosql-stores)\n\t    - [HBase KeyValue Store](sections\u002F03-AdvancedSkills.md#keyvalue-stores-hbase)\n\t    - [HDFS Document Store](sections\u002F03-AdvancedSkills.md#document-stores-hdfs)\n\t    - [MongoDB Document Store](sections\u002F03-AdvancedSkills.md#document-stores-mongodb)\n\t    - [Elasticsearch Document Store](sections\u002F03-AdvancedSkills.md#Elasticsearch-search-engine-and-document-store)\n\t    - [Hive Warehouse](sections\u002F03-AdvancedSkills.md#hive-warehouse)\n\t    - [Impala](sections\u002F03-AdvancedSkills.md#impala)\n\t    - [Kudu](sections\u002F03-AdvancedSkills.md#kudu)\n\t    - [Apache Druid](sections\u002F03-AdvancedSkills.md#apache-druid)\n\t    - [InfluxDB Time Series Database](sections\u002F03-AdvancedSkills.md#influxdb-time-series-database)\n\t    - [Greenplum MPP Database](sections\u002F03-AdvancedSkills.md#mpp-databases-greenplum)\n- [Visualize](sections\u002F03-AdvancedSkills.md#visualize)\n  - [Android and IOS](sections\u002F03-AdvancedSkills.md#android-and-ios)\n  - [API Design for Mobile Apps](sections\u002F03-AdvancedSkills.md#how-to-design-apis-for-mobile-apps)\n  - [Dashboards](sections\u002F03-AdvancedSkills.md#dashboards)\n    - [Grafana](sections\u002F03-AdvancedSkills.md#grafana)\n    - [Kibana](sections\u002F03-AdvancedSkills.md#kibana)\n  - [Webservers](sections\u002F03-AdvancedSkills.md#how-to-use-webservers-to-display-content)\n    - [Tomcat](sections\u002F03-AdvancedSkills.md#tomcat)\n    - [Jetty](sections\u002F03-AdvancedSkills.md#jetty)\n    - [NodeRED](sections\u002F03-AdvancedSkills.md#nodered)\n    - [React](sections\u002F03-AdvancedSkills.md#react)\n  - [Business Intelligence Tools](sections\u002F03-AdvancedSkills.md#business-intelligence-tools)\n    - [Tableau](sections\u002F03-AdvancedSkills.md#tableau)\n    - [Power BI](sections\u002F03-AdvancedSkills.md#power-bi)\n    - [Quliksense](sections\u002F03-AdvancedSkills.md#quliksense)\n  - [Identity & Device Management](sections\u002F03-AdvancedSkills.md#Identity-and-device-management)\n    - [What Is A Digital Twin](sections\u002F03-AdvancedSkills.md#what-is-a-digital-twin)\n    - [Active Directory](sections\u002F03-AdvancedSkills.md#active-directory)\n- [Machine Learning](sections\u002F03-AdvancedSkills.md#machine-learning)\n  - [How to do Machine Learning in production](sections\u002F03-AdvancedSkills.md#how-to-domachine-learning-in-production)\n  - [Why machine learning in production is harder then you think](sections\u002F03-AdvancedSkills.md#why-machine-learning-in-production-is-harder-then-you-think)\n  - [Models Do Not Work Forever](sections\u002F03-AdvancedSkills.md#models-do-not-work-forever)\n  - [Where are The Platforms That Support Machine Learning](sections\u002F03-AdvancedSkills.md#where-are-the-platforms-that-support-this)\n  - [Training Parameter Management](sections\u002F03-AdvancedSkills.md#training-parameter-management)\n  - [How to Convince People That Machine Learning Works](sections\u002F03-AdvancedSkills.md#how-to-convince-people-machine-learning-works)\n  - [No Rules No Physical Models](sections\u002F03-AdvancedSkills.md#no-rules-no-physical-models)\n  - [You Have The Data. Use It!](sections\u002F03-AdvancedSkills.md#you-have-the-data-use-it)\n  - [Data is Stronger Than Opinions](sections\u002F03-AdvancedSkills.md#data-is-stronger-than-opinions)\n  - [AWS Sagemaker](sections\u002F03-AdvancedSkills.md#aws-sagemaker)\n\n\n## Hands On Course\n\n- [Free Data Engineering Course with AWS, TDengine, Docker and Grafana](sections\u002F04-HandsOnCourse.md#free-data-engineering-course-with-aws-tdengine-docker-and-grafana)\n- [Monitor your data in dbt & detect quality issues with Elementary](sections\u002F04-HandsOnCourse.md#monitor-your-data-in-dbt-and-detect-quality-issues-with-elementary)\n- [Solving Engineers 4 Biggest Airflow Problems](sections\u002F04-HandsOnCourse.md#solving-engineers-4-biggest-airflow-problems)\n- [The best alternative to Airlfow? Mage.ai](sections\u002F04-HandsOnCourse.md#the-best-alternative-to-airlfow?-mage.ai)\n\n## Case Studies\n\n- [Data Science @Airbnb](sections\u002F05-CaseStudies.md#data-science-at-Airbnb)\n- [Data Science @Amazon](sections\u002F05-CaseStudies.md#data-science-at-Amazon)\n- [Data Science @Baidu](sections\u002F05-CaseStudies.md#data-science-at-Baidu)\n- [Data Science @Blackrock](sections\u002F05-CaseStudies.md#data-science-at-Blackrock)\n- [Data Science @BMW](sections\u002F05-CaseStudies.md#data-science-at-BMW)\n- [Data Science @Booking.com](sections\u002F05-CaseStudies.md#data-science-at-Booking.com)\n- [Data Science @CERN](sections\u002F05-CaseStudies.md#data-science-at-CERN)\n- [Data Science @Disney](sections\u002F05-CaseStudies.md#data-science-at-Disney)\n- [Data Science @DLR](sections\u002F05-CaseStudies.md#data-science-at-DLR)\n- [Data Science @Drivetribe](sections\u002F05-CaseStudies.md#data-science-at-Drivetribe)\n- [Data Science @Dropbox](sections\u002F05-CaseStudies.md#data-science-at-Dropbox)\n- [Data Science @Ebay](sections\u002F05-CaseStudies.md#data-science-at-Ebay)\n- [Data Science @Expedia](sections\u002F05-CaseStudies.md#data-science-at-Expedia)\n- [Data Science @Facebook](sections\u002F05-CaseStudies.md#data-science-at-Facebook)\n- [Data Science @Google](sections\u002F05-CaseStudies.md#data-science-at-Google)\n- [Data Science @Grammarly](sections\u002F05-CaseStudies.md#data-science-at-Grammarly)\n- [Data Science @ING Fraud](sections\u002F05-CaseStudies.md#data-science-at-ING-Fraud)\n- [Data Science @Instagram](sections\u002F05-CaseStudies.md#data-science-at-Instagram)\n- [Data Science @LinkedIn](sections\u002F05-CaseStudies.md#data-science-at-LinkedIn)\n- [Data Science @Lyft](sections\u002F05-CaseStudies.md#data-science-at-Lyft)\n- [Data Science @NASA](sections\u002F05-CaseStudies.md#data-science-at-NASA)\n- [Data Science @Netflix](sections\u002F05-CaseStudies.md#data-science-at-Netflix)\n- [Data Science @OLX](sections\u002F05-CaseStudies.md#data-science-at-OLX)\n- [Data Science @OTTO](sections\u002F05-CaseStudies.md#data-science-at-OTTO)\n- [Data Science @Paypal](sections\u002F05-CaseStudies.md#data-science-at-Paypal)\n- [Data Science @Pinterest](sections\u002F05-CaseStudies.md#data-science-at-Pinterest)\n- [Data Science @Salesforce](sections\u002F05-CaseStudies.md#data-science-at-Salesforce)\n- [Data Science @Siemens Mindsphere](sections\u002F05-CaseStudies.md#data-science-at-Siemens-Mindsphere)\n- [Data Science @Slack](sections\u002F05-CaseStudies.md#data-science-at-Slack)\n- [Data Science @Spotify](sections\u002F05-CaseStudies.md#data-science-at-Spotify)\n- [Data Science @Symantec](sections\u002F05-CaseStudies.md#data-science-at-Symantec)\n- [Data Science @Tinder](sections\u002F05-CaseStudies.md#data-science-at-Tinder)\n- [Data Science @Twitter](sections\u002F05-CaseStudies.md#data-science-at-Twitter)\n- [Data Science @Uber](sections\u002F05-CaseStudies.md#data-science-at-Uber)\n- [Data Science @Upwork](sections\u002F05-CaseStudies.md#data-science-at-Upwork)\n- [Data Science @Woot](sections\u002F05-CaseStudies.md#data-science-at-Woot)\n- [Data Science @Zalando](sections\u002F05-CaseStudies.md#data-science-at-Zalando)\n\n## Best Practices Cloud Platforms\n\n- [Amazon Web Services (AWS)](sections\u002F06-BestPracticesCloud.md#aws)\n  - [Connect](sections\u002F06-BestPracticesCloud.md#Connect)\n  - [Buffer](sections\u002F06-BestPracticesCloud.md#Buffer)\n  - [Processing](sections\u002F06-BestPracticesCloud.md#Processing)\n  - [Store](sections\u002F06-BestPracticesCloud.md#Store)\n  - [Visualize](sections\u002F06-BestPracticesCloud.md#Visualize)\n  - [Containerization](sections\u002F06-BestPracticesCloud.md#Containerization)\n  - [Best Practices](sections\u002F06-BestPracticesCloud.md#Best-Practices)\n  - [More Details](sections\u002F06-BestPracticesCloud.md#More-Details)\n- [Microsoft Azure](sections\u002F06-BestPracticesCloud.md#azure)\n  - [Connect](sections\u002F06-BestPracticesCloud.md#Connect-1)\n  - [Buffer](sections\u002F06-BestPracticesCloud.md#Buffer-1)\n  - [Processing](sections\u002F06-BestPracticesCloud.md#Processing-1)\n  - [Store](sections\u002F06-BestPracticesCloud.md#Store-1)\n  - [Visualize](sections\u002F06-BestPracticesCloud.md#Visualize-1)\n  - [Containerization](sections\u002F06-BestPracticesCloud.md#Containerization-1)\n  - [Best Practices](sections\u002F06-BestPracticesCloud.md#Best-Practices-1)\n- [Google Cloud Platform (GCP)](sections\u002F06-BestPracticesCloud.md#gcp)\n  - [Connect](sections\u002F06-BestPracticesCloud.md#Connect-2)\n  - [Buffer](sections\u002F06-BestPracticesCloud.md#Buffer-2)\n  - [Processing](sections\u002F06-BestPracticesCloud.md#Processing-2)\n  - [Store](sections\u002F06-BestPracticesCloud.md#Store-2)\n  - [Visualize](sections\u002F06-BestPracticesCloud.md#Visualize-2)\n  - [Containerization](sections\u002F06-BestPracticesCloud.md#Containerization-2)\n  - [Best Practices](sections\u002F06-BestPracticesCloud.md#Best-Practices-2)\n\n## 130+ Free Data Sources For Data Science\n\n- [Student Favorites](sections\u002F07-DataSources.md#Student-Favorites)\n- [General And Academic](sections\u002F07-DataSources.md#General-And-Academic)\n- [Content Marketing](sections\u002F07-DataSources.md#Content-Marketing)\n- [Crime](sections\u002F07-DataSources.md#Crime)\n- [Drugs](sections\u002F07-DataSources.md#Drugs)\n- [Education](sections\u002F07-DataSources.md#Education)\n- [Entertainment](sections\u002F07-DataSources.md#Entertainment)\n- [Environmental And Weather Data](sections\u002F07-DataSources.md#Environmental-And-Weather-Data)\n- [Financial And Economic Data](sections\u002F07-DataSources.md#Financial-And-Economic-Data])\n- [Government And World](sections\u002F07-DataSources.md#Government-And-World)\n- [Health](sections\u002F07-DataSources.md#Health)\n- [Human Rights](sections\u002F07-DataSources.md#Human-Rights)\n- [Labor And Employment Data](sections\u002F07-DataSources.md#Labor-And-Employment-Data)\n- [Politics](sections\u002F07-DataSources.md#Politics)\n- [Retail](sections\u002F07-DataSources.md#Retail)\n- [Social](sections\u002F07-DataSources.md#Social)\n- [Travel And Transportation](sections\u002F07-DataSources.md#Travel-And-Transportation)\n- [Various Portals](sections\u002F07-DataSources.md#Various-Portals)\n- [Source Articles and Blog Posts](sections\u002F07-DataSources.md#Source-Articles-and-Blog-Posts)\n- [Free Data Sources Data Science](sections\u002F07-DataSources.md)\n\n## 1001 Interview Questions\n\n- [Interview Questions](sections\u002F08-InterviewQuestions.md)\n\n## Recommended Books, Courses, and Podcasts\n\n- [About Books and Courses](sections\u002F09-BooksAndCourses.md#about-books-and-courses)\n- [Books](sections\u002F09-BooksAndCourses.md#books)\n  - [Languages](sections\u002F09-BooksAndCourses.md#books-languages)\n  - [Data Tools & Platforms](sections\u002F09-BooksAndCourses.md#books-data-science-tools)\n  - [Business](sections\u002F09-BooksAndCourses.md#Books-Business)\n  - [Community Recommendations](sections\u002F09-BooksAndCourses.md#Community-Recommendations)\n- [Online Courses](sections\u002F09-BooksAndCourses.md#online-courses)\n  - [Preparation courses](sections\u002F09-BooksAndCourses.md#Preparation-courses)\n  - [Data engineering courses](sections\u002F09-BooksAndCourses.md#Data-engineering-courses)\n- [Certifications](sections\u002F09-BooksAndCourses.md#Certifications)\n- [Podcasts](sections\u002F09-BooksAndCourses.md#Podcasts)\n  - [Super Data Science](sections\u002F09-BooksAndCourses.md#Super-Data-Science)\n  - [Data Skeptic](sections\u002F09-BooksAndCourses.md#Data-Skeptic)\n  - [Data Engineering Podcast](sections\u002F09-BooksAndCourses.md#Data-Engineering-Podcast)\n  - [Roaring Elephant BiteSized Big Tech](sections\u002F09-BooksAndCourses.md#Roaring-Elephant-BiteSized-Big-Tech)\n  - [SQL Data Partners Podcast](sections\u002F09-BooksAndCourses.md#SQL-Data-Partners-Podcast)\n\n\n## How To Contribute\nIf you have some cool links or topics for the cookbook, please become a contributor.\n\nSimply pull the repo, add your ideas and create a pull request.\nYou can also open an issue and put your thoughts there.\n\nPlease use the \"Issues\" function for comments.\n\n\n## Important Links\n\nSubscribe to my YouTube channel for regular updates:\n[Link to YouTube](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUCY8mzqqGwl5_bTpBY9qLMAA)\n\nI have a Medium publication where you can publish your data engineer articles to reach more people:\n[Medium publication](https:\u002F\u002Flink.medium.com\u002F9oi1VDrhPW)\n\n\u003Cbr>\n*(As an Amazon Associate I earn from qualifying purchases from Amazon\nThis is free of charge for you, but super helpful for supporting this channel)\n","《Data Engineering Cookbook》是一个为数据工程师提供实用指南和最佳实践的项目。它涵盖了从基础到高级的数据工程技能，包括案例研究、云平台最佳实践、数据源以及面试题等丰富内容，旨在帮助读者构建全面的数据工程知识体系。该项目使用Python语言编写，并采用Apache License 2.0开源许可协议。适合希望成为数据工程师或希望在现有技能基础上增加数据工程能力的学习者和技术人员参考使用。",2,"2026-06-11 03:33:54","high_star"]