[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9813":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":19,"fork":20,"defaultBranch":21,"hasWiki":19,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},9813,"Data-science","CodeCutTech\u002FData-science","CodeCutTech","Collection of useful data science topics along with articles, videos, and code","https:\u002F\u002Fcodecut.ai\u002Fblog",null,"Jupyter Notebook",4214,1065,1,4,0,24,31.08,true,false,"master",[23,24,25,26,27,28,29,30,31,32],"articles","artificial-intelligence","data-analysis","data-science","data-visualization","machine-learning","natural-language-processing","python","scraping","time-series","2026-06-12 02:02:13","> 📦 **We’ve Relocated**  \n> The contents of this repository are now hosted at **[github.com\u002Fkhuyentran1401\u002Fcodecut-blog](https:\u002F\u002Fgithub.com\u002Fkhuyentran1401\u002Fcodecut-blog\u002F)**. Please follow the new repo to stay updated.\n\n# Data Science Articles from CodeCut\n\n## About CodeCut\n\n[CodeCut](https:\u002F\u002Fcodecut.ai\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=introduction) is the platform that helps data scientists stay productive and current by delivering short, practical code examples that highlight modern tools in action.\n\nIt's the resource you wish you had when learning a new library—clean, concise, and instantly applicable.\n\n## Article Collection\n\nThis repository is a curated collection of data science articles from CodeCut, covering topics like MLOps, data management, testing, visualization, and more. Each article comes with practical examples, code repositories, and video tutorials to help you quickly implement these tools and practices in your own projects.\n\n| Category | Title | Article | Repository | Video |\n|----------|-------|---------|------------|-------|\n| MLOps | Goodbye Pip and Poetry. Why UV Might Be All You Need | [🔗](https:\u002F\u002Fcodecut.ai\u002Fwhy-uv-might-all-you-need\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | |\n| MLOps | Stop Hard Coding in a Data Science Project – Use Configuration Files Instead | [🔗](https:\u002F\u002Fcodecut.ai\u002Fstop-hard-coding-in-a-data-science-project-use-configuration-files-instead\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002Fhydra-demo) | [🔗](https:\u002F\u002Fyoutu.be\u002FjaX9zrC7y4Y) |\n| MLOps | Poetry: A Better Way to Manage Python Dependencies | [🔗](https:\u002F\u002Fcodecut.ai\u002Fpoetry-a-better-way-to-manage-python-dependencies\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | [🔗](https:\u002F\u002Fyoutu.be\u002F-QSUyDvHQGY) |\n| MLOps | Git for Data Scientists: Learn Git through Practical Examples | [🔗](https:\u002F\u002Fcodecut.ai\u002Fgit-deep-dive-for-data-scientists\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | | [🔗](https:\u002F\u002Fyoutu.be\u002FUKCTvrJSoL0) |\n| MLOps | 4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python | [🔗](https:\u002F\u002Fcodecut.ai\u002F4-pre-commit-plugins-to-automate-code-reviewing-and-formatting-in-python-2\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Ftree\u002Fmaster\u002Fproductive_tools\u002Fprecommit_examples) | [🔗](https:\u002F\u002Fyoutube.com\u002Fplaylist?list=PLnK6m_JBRVNqskWiXLxx1QRDDng9O8Fsf) |\n| MLOps | How to Structure a Data Science Project for Maintainability | [🔗](https:\u002F\u002Fcodecut.ai\u002Fhow-to-structure-a-data-science-project-for-readability-and-transparency-2\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002Fdata-science-template) | [🔗](https:\u002F\u002Fyoutu.be\u002FTzvcPi3nsdw) |\n| MLOps | Build Reliable Machine Learning Pipelines with Continuous Integration | [🔗](https:\u002F\u002Fcodecut.ai\u002Fbuild-reliable-machine-learning-pipelines-with-continuous-integration-2\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fkhuyentran1401\u002Fcicd-mlops-demo) | [🔗](https:\u002F\u002Fyoutu.be\u002Frkg09nNMAhs) |\n| MLOps | Automate Machine Learning Deployment with GitHub Actions | [🔗](https:\u002F\u002Fcodecut.ai\u002Fautomate-machine-learning-deployment-with-github-actions-2\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fkhuyentran1401\u002Fcicd-mlops-demo) | [🔗](https:\u002F\u002Fyoutu.be\u002F728M0yhI0_M) |\n| MLOps | How to Build a Fully Automated Data Drift Detection Pipeline | [🔗](https:\u002F\u002Fcodecut.ai\u002Fbuild-a-fully-automated-data-drift-detection-pipeline\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fkhuyentran1401\u002Fdetect-data-drift-pipeline) | [🔗](https:\u002F\u002Fyoutu.be\u002F4w2ly3WuL40) |\n| Data Management Tools | Version Control for Data and Models Using DVC | [🔗](https:\u002F\u002Fcodecut.ai\u002Fintroduction-to-dvc-data-version-control-tool-for-machine-learning-projects-2\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002Fdvc-demo) | [🔗](https:\u002F\u002Fyoutu.be\u002F80s_dbfiqLM) |\n| Data Management Tools | What is dbt (data build tool) and When should you use it? | [🔗](https:\u002F\u002Fcodecut.ai\u002Fbuild-an-efficient-data-pipeline-is-dbt-the-key\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002Fdbt-demo) | [🔗](https:\u002F\u002Fyoutu.be\u002FmM5zWBP3G_U) |\n| Data Management Tools | Streamline dbt Model Development with Notebook-Style Workspace | [🔗](https:\u002F\u002Fcodecut.ai\u002Fdbt-mage-interactively-build-and-orchestrate-data-models\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fkhuyentran1401\u002Fdbt-mage) | [🔗](https:\u002F\u002Fyoutu.be\u002FvQFg1Mp60-s) |\n| Testing | Pytest for Data Scientists | [🔗](https:\u002F\u002Fcodecut.ai\u002Fpytest-for-data-scientists-3\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Ftree\u002Fmaster\u002Fdata_science_tools\u002Fpytest) | [🔗](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLnK6m_JBRVNoYEer9hBmTNwkYB3gmbOPO) |\n| Python Helper Tools | Write Clean Python Code Using Pipes | [🔗](https:\u002F\u002Fcodecut.ai\u002Fwrite-clean-python-code-using-pipes-3\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002FCodeCutTech\u002FData-science\u002Fblob\u002Fmaster\u002Fproductive_tools\u002Fpipe.ipynb) | [🔗](https:\u002F\u002Fyoutu.be\u002FK20_eZZGqsc) |\n| Python Helper Tools | Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames | [🔗](https:\u002F\u002Fcodecut.ai\u002Fintroducing-fuguesql-sql-for-pandas-spark-and-dask-dataframes-2\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fdata_science_tools\u002FfugueSQL.ipynb) | |\n| Python Helper Tools | Fugue and DuckDB: Fast SQL Code in Python | [🔗](https:\u002F\u002Fcodecut.ai\u002Ffugue-and-duckdb-fast-sql-code-in-python-2\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fproductive_tools\u002FFugue_and_Duckdb\u002FFugue_and_Duckdb.ipynb) | |\n| Python Helper Tools | Marimo: A Modern Notebook for Reproducible Data Science | [🔗](https:\u002F\u002Fcodecut.ai\u002Fmarimo-a-modern-notebook-for-reproducible-data-science\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Ftree\u002Fmaster\u002Fdata_science_tools\u002Fmarimo_examples) | |\n| Feature Engineering | Polars vs. Pandas: A Fast, Multi-Core Alternative for DataFrames | [🔗](https:\u002F\u002Fcodecut.ai\u002Fpolars-vs-pandas-a-fast-multi-core-alternative-for-dataframes\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fdata_science_tools\u002Fpolars_vs_pandas.ipynb) | |\n| Visualization | Top 6 Python Libraries for Visualization: Which one to Use? | [🔗](https:\u002F\u002Fcodecut.ai\u002Ftop-6-python-libraries-for-visualization-which-one-to-use\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Ftree\u002Fmaster\u002Fvisualization\u002Ftop_visualization.ipynb) | |\n| Python | Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable | [🔗](https:\u002F\u002Fcodecut.ai\u002Fpython-clean-code-6-best-practices-to-make-your-python-functions-more-readable-2\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Ftree\u002Fmaster\u002Fpython\u002Fgood_functions) | [🔗](https:\u002F\u002Fyoutu.be\u002FIDHD8JYBl5M) |\n| Logging and Debugging | Loguru: Simple as Print, Flexible as Logging | [🔗](https:\u002F\u002Fcodecut.ai\u002Fsimplify-your-python-logging-with-loguru\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Ftree\u002Fmaster\u002Fproductive_tools\u002Flogging_tools) | [🔗](https:\u002F\u002Fyoutu.be\u002FXY_OrUoR-HU) |\n| LLM | Enforce Structured Outputs from LLMs with PydanticAI | [🔗](https:\u002F\u002Fcodecut.ai\u002Fenforce-structured-outputs-from-llms-with-pydanticai\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fllm\u002Fpydantic_ai_examples.ipynb) | |\n| LLM | Run Private AI Workflows with LangChain and Ollama | [🔗](https:\u002F\u002Fcodecut.ai\u002Fprivate-ai-workflows-langchain-ollama\u002F?utm_source=github&utm_medium=data_science_repo&utm_campaign=blog) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fllm\u002Flangchain_ollama.ipynb) | |\n| Speed-up Tools | Writing Safer PySpark Queries with Parameters | [🔗](https:\u002F\u002Fcodecut.ai\u002Fpyspark-sql-enhancing-reusability-with-parameterized-queries\u002F) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fdata_science_tools\u002Fpyspark_parametrize.ipynb) | |\n| Speed-up Tools | Narwhals: Unified DataFrame Functions for pandas, Polars, and PySpark | [🔗](https:\u002F\u002Fcodecut.ai\u002Funified-dataframe-functions-pandas-polars-pyspark\u002F) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fdata_science_tools\u002Fnarwhals.ipynb) | |\n| Speed-up Tools | Eager to Lazy DataFrames with Narwhals | [🔗](https:\u002F\u002Fcodecut.ai\u002Feager-to-lazy-dataframes-with-narwhals\u002F) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fdata_science_tools\u002Fnarwhals_row_ordering.ipynb) | |\n| Speed-up Tools | Scaling Pandas Workflows with PySpark's Pandas API | [🔗](https:\u002F\u002Fcodecut.ai\u002Fscaling-pandas-workflows-with-pysparks-pandas-api\u002F) | [🔗](https:\u002F\u002Fgithub.com\u002Fcodecuttech\u002FData-science\u002Fblob\u002Fmaster\u002Fdata_science_tools\u002Fpandas_api_on_spark.ipynb) | |\n\n## Contributing\n\nIf you're passionate about data science and want to share your knowledge about open-source tools for data processing and LLM applications in Python, we'd love to have you contribute!\n\nTo contribute:\n\n1. Create a GitHub issue:\n   - Click on the \"Issues\" tab\n   - Click \"New issue\"\n   - Select \"Article Topic Suggestion\" template\n   - Fill in the template with your article proposal\n2. Read our [contribution guidelines](contribution.md)\n","CodeCutTech\u002FData-science 是一个汇集了数据科学相关文章、视频和代码示例的资源库。该项目主要以 Jupyter Notebook 形式呈现，涵盖了从人工智能、数据分析到自然语言处理等多个领域的实用教程和技术分享。它提供了丰富的 MLOps 实践案例，如使用配置文件替代硬编码、通过 Git 进行版本控制等，并且每个主题都配有详细的代码仓库链接与教学视频，非常适合希望快速掌握最新工具和最佳实践的数据科学家或相关从业者参考学习。",2,"2026-06-11 03:24:51","top_topic"]