[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9801":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":10,"languages":10,"totalLinesOfCode":10,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":14,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":37,"readmeContent":38,"aiSummary":39,"trendingCount":15,"starSnapshotCount":15,"syncStatus":40,"lastSyncTime":41,"discoverSource":42},9801,"datascience","r0f1\u002Fdatascience","r0f1","Curated list of Python resources for data science.","",null,4631,711,138,1,0,3,15,63.56,"Creative Commons Zero v1.0 Universal",false,"master",true,[24,25,26,27,28,29,30,31,5,32,33,34,35,36],"artificial-intelligence","awesome","awesome-list","bayes","data-analysis","data-mining","data-science","data-visualization","deep-learning","deeplearning","machine-learning","python","statistics","2026-06-12 04:00:46","# Awesome Data Science with Python\n\n> A curated list of awesome resources for practicing data science using Python, including not only libraries, but also links to tutorials, code snippets, blog posts and talks.  \n\n#### Core\n[pandas](https:\u002F\u002Fpandas.pydata.org\u002F) - Data structures built on top of [numpy](https:\u002F\u002Fwww.numpy.org\u002F).  \n[scikit-learn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002F) - Core ML library, [intelex](https:\u002F\u002Fgithub.com\u002Fintel\u002Fscikit-learn-intelex).  \n[matplotlib](https:\u002F\u002Fmatplotlib.org\u002F) - Plotting library.  \n[seaborn](https:\u002F\u002Fseaborn.pydata.org\u002F) - Data visualization library based on matplotlib.  \n[ydata-profiling](https:\u002F\u002Fgithub.com\u002Fydataai\u002Fydata-profiling) - Descriptive statistics using `ProfileReport`.  \n[sklearn_pandas](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fsklearn-pandas) - Helpful `DataFrameMapper` class.  \n[missingno](https:\u002F\u002Fgithub.com\u002FResidentMario\u002Fmissingno) - Missing data visualization.  \n[rainbow-csv](https:\u002F\u002Fmarketplace.visualstudio.com\u002Fitems?itemName=mechatroner.rainbow-csv) - VSCode plugin to display .csv files with nice colors.  \n\n#### General Python Programming\n[Advanced Python Features](https:\u002F\u002Fblog.edward-li.com\u002Ftech\u002Fadvanced-python-features\u002F) - Generics, Protocols, Structural Pattern Matching and more.  \n[uv](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv) - Dependency management.  \n[pdm](https:\u002F\u002Fpdm-project.org\u002Fen\u002Flatest\u002F) - For large binary distributions, works with uv.  \n[just](https:\u002F\u002Fgithub.com\u002Fcasey\u002Fjust) - Command runner. Replacement for make.  \n[python-dotenv](https:\u002F\u002Fgithub.com\u002Ftheskumar\u002Fpython-dotenv) - Manage environment variables.  \n[structlog](https:\u002F\u002Fgithub.com\u002Fhynek\u002Fstructlog) - Python logging.  \n[more_itertools](https:\u002F\u002Fmore-itertools.readthedocs.io\u002Fen\u002Flatest\u002F) - Extension of itertools.  \n[tqdm](https:\u002F\u002Fgithub.com\u002Ftqdm\u002Ftqdm) - Progress bars for for-loops. Also supports [pandas apply()](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F34365537\u002F1820480).  \n[hydra](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhydra) - Configuration management.  \n\n#### Pandas Tricks, Alternatives and Additions\n[duckdb](https:\u002F\u002Fgithub.com\u002Fduckdb\u002Fduckdb) - Efficiently run SQL queries on pandas DataFrame, [duckplyr](https:\u002F\u002Fgithub.com\u002Ftidyverse\u002Fduckplyr\u002F) for R, [Great Intro](https:\u002F\u002Fcodecut.ai\u002Fdeep-dive-into-duckdb-data-scientists\u002F).  \n[ducklake](https:\u002F\u002Fgithub.com\u002Fduckdb\u002Fducklake) - Duckdb extention for storing data in a datalake.  \n[fireducks](https:\u002F\u002Fgithub.com\u002Ffireducks-dev\u002Ffireducks) - Speedier alternative to pandas with similar API.  \n[pandasvault](https:\u002F\u002Fgithub.com\u002Ffirmai\u002Fpandasvault) - Large collection of pandas tricks.  \n[polars](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars) - Multi-threaded alternative to pandas.  \n[xarray](https:\u002F\u002Fgithub.com\u002Fpydata\u002Fxarray\u002F) - Extends pandas to n-dimensional arrays.  \n[mlx](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx) - An array framework for Apple silicon.  \n[pandas_flavor](https:\u002F\u002Fgithub.com\u002FZsailer\u002Fpandas_flavor) - Write custom accessors like `.str` and `.dt`.   \n[daft](https:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft) - Distributed DataFrame.  \n[vaex](https:\u002F\u002Fgithub.com\u002Fvaexio\u002Fvaex) - Out-of-Core DataFrames.  \n[modin](https:\u002F\u002Fgithub.com\u002Fmodin-project\u002Fmodin) - Parallelization library for faster pandas `DataFrame`.  \n[swifter](https:\u002F\u002Fgithub.com\u002Fjmcarpenter2\u002Fswifter) - Apply any function to a pandas DataFrame faster (works with modin). \n\n#### Tables\n[great-tables](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fgreat-tables) - Display tabular data nicely.  \n\n#### Interactive Dataframe Visualization\n[pygwalker](https:\u002F\u002Fgithub.com\u002FKanaries\u002Fpygwalker) - Interactive dataframe.  \n[marimo](https:\u002F\u002Fgithub.com\u002Fmarimo-team\u002Fmarimo) - Visualization and reproducible environment.  \n[lux](https:\u002F\u002Fgithub.com\u002Flux-org\u002Flux) - DataFrame visualization within Jupyter.  \n[dtale](https:\u002F\u002Fgithub.com\u002Fman-group\u002Fdtale) - View and analyze Pandas data structures, integrating with Jupyter.  \n[pandasgui](https:\u002F\u002Fgithub.com\u002Fadamerose\u002Fpandasgui) - GUI for viewing, plotting and analyzing Pandas DataFrames.  \n[quak](https:\u002F\u002Fgithub.com\u002Fmanzt\u002Fquak) - Scalable, interactive data table, [twitter](https:\u002F\u002Fx.com\u002Ftrevmanz\u002Fstatus\u002F1816760923949809982).  \n[data-formulator](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fdata-formulator) - Data visualization tool.  \n\n\n#### Environment and Jupyter\n[Jupyter Tricks](https:\u002F\u002Fwww.dataquest.io\u002Fblog\u002Fjupyter-notebook-tips-tricks-shortcuts\u002F)  \n[nteract](https:\u002F\u002Fnteract.io\u002F) - Open Jupyter Notebooks with doubleclick.  \n[papermill](https:\u002F\u002Fgithub.com\u002Fnteract\u002Fpapermill) - Parameterize and execute Jupyter notebooks, [tutorial](https:\u002F\u002Fpbpython.com\u002Fpapermil-rclone-report-1.html).  \n[nbdime](https:\u002F\u002Fgithub.com\u002Fjupyter\u002Fnbdime) - Diff two notebook files, Alternative GitHub App: [ReviewNB](https:\u002F\u002Fwww.reviewnb.com\u002F).  \n[RISE](https:\u002F\u002Fgithub.com\u002Fdamianavila\u002FRISE) - Turn Jupyter notebooks into presentations.  \n[handcalcs](https:\u002F\u002Fgithub.com\u002Fconnorferster\u002Fhandcalcs) - More convenient way of writing mathematical equations in Jupyter.  \n[notebooker](https:\u002F\u002Fgithub.com\u002Fman-group\u002Fnotebooker) - Productionize and schedule Jupyter Notebooks.  \n[voila](https:\u002F\u002Fgithub.com\u002FQuantStack\u002Fvoila) - Turn Jupyter notebooks into standalone web applications. [Voila grid layout](https:\u002F\u002Fgithub.com\u002Fvoila-dashboards\u002Fvoila-gridstack).  \n\n#### Jupyter Alternatives\n[positron](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fpositron) - Data Science IDE.  \n[Deepnote](https:\u002F\u002Fdeepnote.com) - Data Science platform with real-time collaboration, environment management.  \n\n#### Extraction + OCR\n[textract](https:\u002F\u002Fgithub.com\u002Fdeanmalmgren\u002Ftextract) - Extract text from any document.  \n[docling](https:\u002F\u002Fgithub.com\u002Fdocling-project\u002Fdocling) - Text extraction.  \n[DeepSeek-OCR](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-OCR) - OCR.  \n[chandra](https:\u002F\u002Fgithub.com\u002Fdatalab-to\u002Fchandra) - OCR.  \n\n#### Big Data\n[spark](https:\u002F\u002Fdocs.databricks.com\u002Fspark\u002Flatest\u002Fdataframes-datasets\u002Fintroduction-to-dataframes-python.html#work-with-dataframes) - `DataFrame` for big data, [cheatsheet](https:\u002F\u002Fgist.github.com\u002Fcrawles\u002Fb47e23da8218af0b9bd9d47f5242d189), [tutorial](https:\u002F\u002Fgithub.com\u002Fericxiao251\u002Fspark-syntax).  \n[dask](https:\u002F\u002Fgithub.com\u002Fdask\u002Fdask), [dask-ml](http:\u002F\u002Fml.dask.org\u002F) - Pandas `DataFrame` for big data and machine learning library, [resources](https:\u002F\u002Fmatthewrocklin.com\u002Fblog\u002F\u002Fwork\u002F2018\u002F07\u002F17\u002Fdask-dev), [talk1](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ccfsbuqsjgI), [talk2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=RA_2qdipVng), [notebooks](https:\u002F\u002Fgithub.com\u002Fdask\u002Fdask-ec2\u002Ftree\u002Fmaster\u002Fnotebooks), [videos](https:\u002F\u002Fwww.youtube.com\u002Fuser\u002Fmdrocklin).  \n[h2o](https:\u002F\u002Fgithub.com\u002Fh2oai\u002Fh2o-3) - Helpful `H2OFrame` class for out-of-memory dataframes.  \n[cuDF](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcudf) - GPU DataFrame Library, [Intro](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6XzS5XcpicM&t=2m50s).  \n[cupy](https:\u002F\u002Fgithub.com\u002Fcupy\u002Fcupy) - NumPy-like API accelerated with CUDA.  \n[ray](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray\u002F) - Flexible, high-performance distributed execution framework.  \n[bottleneck](https:\u002F\u002Fgithub.com\u002Fkwgoodman\u002Fbottleneck) - Fast NumPy array functions written in C.   \n[petastorm](https:\u002F\u002Fgithub.com\u002Fuber\u002Fpetastorm) - Data access library for parquet files by Uber.  \n[zarr](https:\u002F\u002Fgithub.com\u002Fzarr-developers\u002Fzarr-python) - Distributed NumPy arrays.  \n[NVTabular](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNVTabular) - Feature engineering and preprocessing library for tabular data by Nvidia.  \n[tensorstore](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftensorstore) - Reading and writing large multi-dimensional arrays (Google).  \n\n#### Command line tools, CSV\n[csvkit](https:\u002F\u002Fgithub.com\u002Fwireservice\u002Fcsvkit) - Command line tool for CSV files.  \n[csvsort](https:\u002F\u002Fpypi.org\u002Fproject\u002Fcsvsort\u002F) - Sort large csv files.  \n\n#### Classical Statistics\n\n##### Books\n[Lakens - Improving Your Statistical Inferences](https:\u002F\u002Flakens.github.io\u002Fstatistical_inferences\u002F) - Testing, Effect Sizes, Confidence Intervals, Sample Size, Equivalence Testing, Sequential Analysis, [Github](https:\u002F\u002Fgithub.com\u002FLakens\u002Fstatistical_inferences)  \n[Models Demystified](https:\u002F\u002Fm-clark.github.io\u002Fbook-of-models\u002F) - From Linear Regression to Deep Learning. [Github](https:\u002F\u002Fgithub.com\u002Fm-clark\u002Fbook-of-models).  \n[The Math Behind Artificial Intelligence](https:\u002F\u002Fwww.freecodecamp.org\u002Fnews\u002Fthe-math-behind-artificial-intelligence-book) - Engineering-focused book covering linear algebra, calculus, probability & statistics, and optimization theory with Python examples.  \n\n##### Datasets\n[Rdatasets](https:\u002F\u002Fvincentarelbundock.github.io\u002FRdatasets\u002Farticles\u002Fdata.html) - Collection of more than 2000 datasets, stored as csv files (R package).  \n[crimedatasets](https:\u002F\u002Flightbluetitan.github.io\u002Fcrimedatasets\u002F) - Datasets focused on crimes, criminal activities (R package).  \n[educationr](https:\u002F\u002Flightbluetitan.github.io\u002Feducationr\u002F) - Datasets related to education (performance, learning methods, test scores, absenteeism) (R package).  \n[MedDataSets](https:\u002F\u002Flightbluetitan.github.io\u002Fmeddatasets\u002Findex.html) - Datasets related to medicine, diseases, treatments, drugs, and public health (R package).  \n[oncodatasets](https:\u002F\u002Flightbluetitan.github.io\u002Foncodatasets\u002F) - Datasets focused on cancer research, survival rates, genetic studies, biomarkers, epidemiology (R package).  \n[timeseriesdatasets_R](https:\u002F\u002Flightbluetitan.github.io\u002Ftimeseriesdatasets_R\u002F) - Time series datasets (R package).  \n[usdatasets](https:\u002F\u002Flightbluetitan.github.io\u002Fusdatasets\u002F) - US-exclusive datasets (crime, economics, education, finance, energy, healthcare) (R package).  \n[economic datasets](https:\u002F\u002Fcaptgouda24.github.io\u002Fnicholas-decker.github.io\u002Fdatasets.html) - Economic datasets.  \n\n##### p-values\n[The ASA Statement on p-Values: Context, Process, and Purpose](https:\u002F\u002Famstat.tandfonline.com\u002Fdoi\u002Ffull\u002F10.1080\u002F00031305.2016.1154108#.Vt2XIOaE2MN)  \n[Greenland - Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC4877414\u002F)  \n[Rubin - Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS2590260124000067?via%3Dihub)  \n[Gigerenzer - Mindless Statistics](https:\u002F\u002Flibrary.mpib-berlin.mpg.de\u002Fft\u002Fgg\u002FGG_Mindless_2004.pdf)  \n[Rubin - That's not a two-sided test! It's two one-sided tests! (TOST)](https:\u002F\u002Frss.onlinelibrary.wiley.com\u002Fdoi\u002Ffull\u002F10.1111\u002F1740-9713.01405)  \n[Lakens - How were we supposed to move beyond  p \u003C .05, and why didn’t we?](https:\u002F\u002Ferrorstatistics.com\u002F2024\u002F07\u002F01\u002Fguest-post-daniel-lakens-how-were-we-supposed-to-move-beyond-p-05-and-why-didnt-we-thoughts-on-abandon-statistical-significance-5-years-on\u002F)  \n[McShane et al. - Abandon Statistical Significance](https:\u002F\u002Fwww.tandfonline.com\u002Fdoi\u002Ffull\u002F10.1080\u002F00031305.2018.1527253)  \n[Ho et al. - Moving beyond P values data analysis with estimation graphics](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F333884529_Moving_beyond_P_values_data_analysis_with_estimation_graphics)  \n[Lakens - The probability of p-values as a function of the statistical power of a test](https:\u002F\u002Fdaniellakens.blogspot.com\u002F2014\u002F05\u002Fthe-probability-of-p-values-as-function.html) - p-value distribution is right-skewed and becomes even more skewed the higher the power of the test.  \n\n##### Correlation\n[Guess the Correlation](https:\u002F\u002Fwww.guessthecorrelation.com\u002F) - Correlation guessing game.  \n[phik](https:\u002F\u002Fgithub.com\u002Fkaveio\u002Fphik) - Correlation between categorical, ordinal and interval variables.  \n[hoeffd](https:\u002F\u002Fsearch.r-project.org\u002FCRAN\u002Frefmans\u002FHmisc\u002Fhtml\u002Fhoeffd.html) - Hoeffding's D Statistics, measure of dependence (R package).  \n\n##### Confidence Intervals\n[Morey - The fallacy of placing confidence in confidence intervals](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.3758\u002Fs13423-015-0947-8)  \n\n##### Packages\n[statsmodels](https:\u002F\u002Fwww.statsmodels.org\u002Fstable\u002Findex.html) - Statistical tests.  \n[linearmodels](https:\u002F\u002Fgithub.com\u002Fbashtage\u002Flinearmodels) - Instrumental variable and panel data models.  \n[nomograms](https:\u002F\u002Fhbiostat.org\u002Fbbr\u002Frmsintro.html#nomograms-overall-depiction-of-fitted-models) - Visualization for linear models, [explanation](https:\u002F\u002Fstats.stackexchange.com\u002Fa\u002F155433\u002F285504) (Part of rms R package)  \n[pingouin](https:\u002F\u002Fgithub.com\u002Fraphaelvallat\u002Fpingouin) - Statistical tests. [Pairwise correlation between columns of pandas DataFrame](https:\u002F\u002Fpingouin-stats.org\u002Fgenerated\u002Fpingouin.pairwise_corr.html)   \n[scipy.stats](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fstats.html#statistical-tests) - Statistical tests.  \n[scikit-posthocs](https:\u002F\u002Fgithub.com\u002Fmaximtrp\u002Fscikit-posthocs) - Statistical post-hoc tests for pairwise multiple comparisons.   \nBland-Altman Plot [1](https:\u002F\u002Fpingouin-stats.org\u002Fgenerated\u002Fpingouin.plot_blandaltman.html), [2](http:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.graphics.agreement.mean_diff_plot.html) - Plot for agreement between two methods of measurement.  \n[ANOVA](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.f_oneway.html)  \n[StatCheck](https:\u002F\u002Fstatcheck.steveharoz.com\u002F) - Extract statistics from articles and recompute p-values (R package).  \n[tost](https:\u002F\u002Fpingouin-stats.org\u002Fbuild\u002Fhtml\u002Fgenerated\u002Fpingouin.tost.html) - Two One-Sided Test (TOST) for equivalence.  \n[DABEST-python](https:\u002F\u002Fgithub.com\u002FACCLAB\u002FDABEST-python) - Mean difference plots.    \n[Durga](https:\u002F\u002Fgithub.com\u002FKhanKawsar\u002FEstimationPlot) - Mean difference plots (R package).  \n\n##### Effect Size\n[MOTE Effect Size Calculator](https:\u002F\u002Fwww.aggieerin.com\u002Fshiny-server\u002F) - [Shiny App](https:\u002F\u002Fdoomlab.shinyapps.io\u002Fmote\u002F), [R package](https:\u002F\u002Fgithub.com\u002Fdoomlab\u002FMOTE)  \n[Estimating Effect Sizes From Pretest-Posttest-Control Group Designs](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002Fepdf\u002F10.1177\u002F1094428106291059) - Scott B. Morris, [Twitter](https:\u002F\u002Ftwitter.com\u002FMatthewBJane\u002Fstatus\u002F1742588609025200557)    \n\n##### Statistical Tests\n[test_proportions_2indep](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.stats.proportion.test_proportions_2indep.html) - Proportion test.  \n[G-Test](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FG-test) - Alternative to chi-square test, [power_divergence](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.power_divergence.html).  \n\n##### Comparing Two Populations\n[torch-two-sample](https:\u002F\u002Fgithub.com\u002Fjosipd\u002Ftorch-two-sample) - Friedman-Rafsky Test: Compare two population based on a multivariate generalization of the Runstest. [Explanation](https:\u002F\u002Fwww.real-statistics.com\u002Fmultivariate-statistics\u002Fmultivariate-normal-distribution\u002Ffriedman-rafsky-test\u002F), [Application](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC5014134\u002F)  \n\n##### Power and Sample Size Calculations\n[pwrss](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fpwrss\u002Findex.html) - Statistical Power and Sample Size Calculation Tools (R package), [Tutorial with t-test](https:\u002F\u002Frpubs.com\u002Fmetinbulus\u002Fwelch)  \n\n##### Interim Analyses \u002F Sequential Analysis \u002F Stopping\n[Stop Early Stopping](https:\u002F\u002Fstop-early-stopping.osc.garden\u002F) - Nice visualization\n[Sequential Analysis](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FSequential_analysis) - Wikipedia.  \n[sequential](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002FSequential\u002FSequential.pdf) - Exact Sequential Analysis for Poisson and Binomial Data (R package).  \n[confseq](https:\u002F\u002Fgithub.com\u002Fgostevehoward\u002Fconfseq) - Uniform boundaries, confidence sequences, and always-valid p-values.  \n\n##### Visualizations\n[Friends don't let friends make certain types of data visualization](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends)  \n[Great Overview over Visualizations](https:\u002F\u002Ftextvis.lnu.se\u002F)  \n[1 dataset, 100 visualizations](https:\u002F\u002F100.datavizproject.com\u002F)  \n[Dependent Propabilities](https:\u002F\u002Fstatic.laszlokorte.de\u002Fstochastic\u002F)  \n[Null Hypothesis Significance Testing (NHST) and Sample Size Calculation](https:\u002F\u002Frpsychologist.com\u002Fd3\u002FNHST\u002F)  \n[estimationstats](https:\u002F\u002Fwww.estimationstats.com\u002F) - Online Tool for visualizing mean differences, effect sizes (Cohen's d) and others.  \n[Sample Size \u002F Duration Calculator](https:\u002F\u002Fcalculator.osc.garden\u002F)  \n[Correlation](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fcorrelation\u002F)  \n[Cohen's d](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fcohend\u002F)  \n[Confidence Interval](https:\u002F\u002Frpsychologist.com\u002Fd3\u002FCI\u002F)  \n[Equivalence, non-inferiority and superiority testing](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fequivalence\u002F)  \n[Bayesian two-sample t test](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fbayes\u002F)  \n[Distribution of p-values when comparing two groups](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Fpdist\u002F)  \n[Understanding the t-distribution and its normal approximation](https:\u002F\u002Frpsychologist.com\u002Fd3\u002Ftdist\u002F)  \n[Statistical Power and Sample Size Calculation Tools](https:\u002F\u002Fpwrss.shinyapps.io\u002Findex\u002F)  \n\n##### Tidy Tuesday\n[The Art of Data Visualization with ggplot2, The TidyTuesday Cookbook](https:\u002F\u002Fnrennie.rbind.io\u002Fart-of-viz\u002F)  \n[Best Practices for Data Visualization](https:\u002F\u002Froyal-statistical-society.github.io\u002Fdatavisguide\u002F)  \n[tidytuesday](https:\u002F\u002Fgithub.com\u002Frfordatascience\u002Ftidytuesday) - Weekly challenge for visualization and lots of publicly available datasets for practice.  \n[z3tt\u002FTidyTuesday](https:\u002F\u002Fgithub.com\u002Fz3tt\u002FTidyTuesday) - Nice charts (R).  \n[nrennie\u002Ftidytuesday](https:\u002F\u002Fgithub.com\u002Fnrennie\u002Ftidytuesday) - Nice charts (R).  \n[poncest\u002Ftidytuesday](https:\u002F\u002Fgithub.com\u002Fponcest\u002Ftidytuesday) - Nice charts (R).  \n\n##### Talks\n[Inverse Propensity Weighting](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=SUq0shKLPPs)  \n[Dealing with Selection Bias By Propensity Based Feature Selection](https:\u002F\u002Fwww.youtube.com\u002Fwatch?reload=9&v=3ZWCKr0vDtc)  \n\n##### Texts\n[Modes, Medians and Means: A Unifying Perspective](https:\u002F\u002Fwww.johnmyleswhite.com\u002Fnotebook\u002F2013\u002F03\u002F22\u002Fmodes-medians-and-means-an-unifying-perspective\u002F)   \n[Using Norms to Understand Linear Regression](https:\u002F\u002Fwww.johnmyleswhite.com\u002Fnotebook\u002F2013\u002F03\u002F22\u002Fusing-norms-to-understand-linear-regression\u002F)   \n[Verifying the Assumptions of Linear Models](https:\u002F\u002Fgithub.com\u002Ferykml\u002Fmedium_articles\u002Fblob\u002Fmaster\u002FStatistics\u002Flinear_regression_assumptions.ipynb)  \n[Mediation and Moderation Intro](https:\u002F\u002Fademos.people.uic.edu\u002FChapter14.html)  \n[Montgomery et al. - How conditioning on post-treatment variables can ruin your experiment and what to do about it](https:\u002F\u002Fcpb-us-e1.wpmucdn.com\u002Fsites.dartmouth.edu\u002Fdist\u002F5\u002F2293\u002Ffiles\u002F2021\u002F03\u002Fpost-treatment-bias.pdf)  \n[Lindeløv - Common statistical tests are linear models](https:\u002F\u002Flindeloev.github.io\u002Ftests-as-linear\u002F)    \n[Chatruc - The Central Limit Theorem and its misuse](https:\u002F\u002Fweb.archive.org\u002Fweb\u002F20191229234155\u002Fhttps:\u002F\u002Flambdaclass.com\u002Fdata_etudes\u002Fcentral_limit_theorem_misuse\u002F)  \n[Al-Saleh - Properties of the Standard Deviation that are Rarely Mentioned in Classrooms](http:\u002F\u002Fwww.stat.tugraz.at\u002FAJS\u002Fausg093\u002F093Al-Saleh.pdf)   \n[Wainer - The Most Dangerous Equation](http:\u002F\u002Fnsmn1.uh.edu\u002Fdgraur\u002Fniv\u002Fthemostdangerousequation.pdf)  \n[Gigerenzer - The Bias Bias in Behavioral Economics](https:\u002F\u002Fwww.nowpublishers.com\u002Farticle\u002FDetails\u002FRBE-0092)  \n[Cook - Estimating the chances of something that hasn’t happened yet](https:\u002F\u002Fwww.johndcook.com\u002Fblog\u002F2010\u002F03\u002F30\u002Fstatistical-rule-of-three\u002F)  \n[Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing), [Youtube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DbJyPELmhJc)  \n[How large is that number in the Law of Large Numbers?](https:\u002F\u002Fthepalindrome.org\u002Fp\u002Fhow-large-that-number-in-the-law)  \n[The Prosecutor's Fallacy](https:\u002F\u002Fwww.cebm.ox.ac.uk\u002Fnews\u002Fviews\u002Fthe-prosecutors-fallacy)  \n[The Dunning-Kruger Effect is Autocorrelation](https:\u002F\u002Feconomicsfromthetopdown.com\u002F2022\u002F04\u002F08\u002Fthe-dunning-kruger-effect-is-autocorrelation\u002F)  \n[Rafi, Greenland - Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise](https:\u002F\u002Fbmcmedresmethodol.biomedcentral.com\u002Farticles\u002F10.1186\u002Fs12874-020-01105-9)   \n[Carlin et al. - On the uses and abuses of regression models: a call for reform of statistical practice and teaching](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.06668)  \n[Chen, Roth - Logs with zeros? Some problems and solutions](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.06080)  \n[Wigboldus et al. - Encourage Playing with Data and Discourage Questionable Reporting Practices](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs11336-015-9445-1)  \n[Simmons et al. - False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.1177\u002F0956797611417632?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)  \n[Zhang - An illusion of predictability in scientific results: Even experts confuse inferential uncertainty and outcome variability](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002F10.1073\u002Fpnas.2302491120) - Figure 1 shows difference between inferential uncertainty and outcome variability.  \n\n#### Evaluation\n[Collins et al. - Evaluation of clinical prediction models (part 1): from development to external validation](https:\u002F\u002Fwww.bmj.com\u002Fcontent\u002F384\u002Fbmj-2023-074819.full) - [Twitter](https:\u002F\u002Ftwitter.com\u002FGSCollins\u002Fstatus\u002F1744309712995098624)    \n\n#### Epidemiology\n[Lesko et al. - A Framework for Descriptive Epidemiology](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10144679\u002F)  \n[R Epidemics Consortium](https:\u002F\u002Fwww.repidemicsconsortium.org\u002Fprojects\u002F) - Large tool suite for working with epidemiological data (R packages). [Github](https:\u002F\u002Fgithub.com\u002Freconhub)   \n[incidence2](https:\u002F\u002Fgithub.com\u002Freconhub\u002Fincidence2) - Computation, handling, visualisation and simple modelling of incidence (R package).  \n[EpiEstim](https:\u002F\u002Fgithub.com\u002Fmrc-ide\u002FEpiEstim) - Estimate time varying instantaneous reproduction number R during epidemics (R package) [paper](https:\u002F\u002Facademic.oup.com\u002Faje\u002Farticle\u002F178\u002F9\u002F1505\u002F89262).  \n[researchpy](https:\u002F\u002Fgithub.com\u002Fresearchpy\u002Fresearchpy) - Helpful `summary_cont()` function for summary statistics (Table 1).  \n[zEpid](https:\u002F\u002Fgithub.com\u002Fpzivich\u002FzEpid) - Epidemiology analysis package, [Tutorial](https:\u002F\u002Fgithub.com\u002Fpzivich\u002FPython-for-Epidemiologists).  \n[tipr](https:\u002F\u002Fgithub.com\u002FLucyMcGowan\u002Ftipr) - Sensitivity analyses for unmeasured confounders (R package).  \n[quartets](https:\u002F\u002Fgithub.com\u002Fr-causal\u002Fquartets) - Anscombe’s Quartet, Causal Quartet, [Datasaurus Dozen](https:\u002F\u002Fgithub.com\u002Fjumpingrivers\u002FdatasauRus) and others (R package).    \n[episensr](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fepisensr\u002Fvignettes\u002Fepisensr.html) - Quantitative Bias Analysis for Epidemiologic Data (=simulation of possible effects of different sources of bias) (R package).  \n\n#### Machine Learning Tutorials\n[Statistical Inference and Regression](https:\u002F\u002Fmattblackwell.github.io\u002Fgov2002-book\u002F)  \n[Applied Machine Learning in Python](https:\u002F\u002Fgeostatsguy.github.io\u002FMachineLearningDemos_Book\u002Fintro.html)  \n[Convolutional Neural Networks for Visual Recognition](https:\u002F\u002Fcs231n.github.io\u002F) - Stanford CS class.  \n[Intuition for the Algorithms in Machine Learning](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=7o9TMQAHgkQ&list=PLNeXFnYrCJneoY_rKtWJy833YiMrCRi5f&index=1) - Lecture Series.  \n\n#### Exploration and Cleaning\n[Checklist](https:\u002F\u002Fgithub.com\u002Fr0f1\u002Fml_checklist).  \n[pyjanitor](https:\u002F\u002Fgithub.com\u002Fpyjanitor-devs\u002Fpyjanitor) - Clean messy column names.  \n[skimpy](https:\u002F\u002Fgithub.com\u002Faeturrell\u002Fskimpy) - Create summary statistics of dataframes. Helpful `clean_columns()` function.  \n[pandera](https:\u002F\u002Fgithub.com\u002Funionai-oss\u002Fpandera) - Data \u002F Schema validation.  \n[dataframely](https:\u002F\u002Fgithub.com\u002FQuantco\u002Fdataframely) - Data \u002F Schema validation.  \n[pointblank](https:\u002F\u002Fgithub.com\u002Fposit-dev\u002Fpointblank) - Data \u002F Schema validation.  \n[impyute](https:\u002F\u002Fgithub.com\u002Feltonlaw\u002Fimpyute) - Imputations.  \n[fancyimpute](https:\u002F\u002Fgithub.com\u002Fiskandr\u002Ffancyimpute) - Matrix completion and imputation algorithms.  \n[imbalanced-learn](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fimbalanced-learn) - Resampling for imbalanced datasets.  \n[tspreprocess](https:\u002F\u002Fgithub.com\u002FMaxBenChrist\u002Ftspreprocess) - Time series preprocessing: Denoising, Compression, Resampling.  \n[Kaggler](https:\u002F\u002Fgithub.com\u002Fjeongyoonlee\u002FKaggler) - Utility functions (`OneHotEncoder(min_obs=100)`)  \n[skrub](https:\u002F\u002Fgithub.com\u002Fskrub-data\u002Fskrub) - Bridge the gap between tabular data sources and machine-learning models.  \n\n#### Noisy Labels\n[cleanlab](https:\u002F\u002Fgithub.com\u002Fcleanlab\u002Fcleanlab) - Machine learning with noisy labels, finding mislabelled data, and uncertainty quantification. Also see awesome list below.  \n[doubtlab](https:\u002F\u002Fgithub.com\u002Fkoaning\u002Fdoubtlab) - Find bad or noisy labels.\n\n#### Train \u002F Test Split\n[iterative-stratification](https:\u002F\u002Fgithub.com\u002Ftrent-b\u002Fiterative-stratification) - Stratification of multilabel data.  \n\n#### Feature Engineering\n[Vincent Warmerdam: Untitled12.ipynb](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=yXGCKqo5cEY) - Using df.pipe()  \n[Vincent Warmerdam: Winning with Simple, even Linear, Models](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=68ABAU_V8qI)  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.pipeline.Pipeline.html) - Pipeline, [examples](https:\u002F\u002Fgithub.com\u002Fjem1031\u002Fpandas-pipelines-custom-transformers).  \n[pdpipe](https:\u002F\u002Fgithub.com\u002Fshaypal5\u002Fpdpipe) - Pipelines for DataFrames.  \n[scikit-lego](https:\u002F\u002Fgithub.com\u002Fkoaning\u002Fscikit-lego) - Custom transformers for pipelines.  \n[categorical-encoding](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fcategorical-encoding) - Categorical encoding of variables, [vtreat (R package)](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fvtreat\u002Fvignettes\u002Fvtreat.html).  \n[patsy](https:\u002F\u002Fgithub.com\u002Fpydata\u002Fpatsy\u002F) - R-like syntax for statistical models.  \n[mlxtend](https:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Ffeature_extraction\u002FLinearDiscriminantAnalysis\u002F) - LDA.  \n[featuretools](https:\u002F\u002Fgithub.com\u002FFeaturetools\u002Ffeaturetools) - Automated feature engineering, [example](https:\u002F\u002Fgithub.com\u002FWillKoehrsen\u002Fautomated-feature-engineering\u002Fblob\u002Fmaster\u002Fwalk_through\u002FAutomated_Feature_Engineering.ipynb).  \n[tsfresh](https:\u002F\u002Fgithub.com\u002Fblue-yonder\u002Ftsfresh) - Time series feature engineering.  \n[temporian](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftemporian) - Time series feature engineering by Google.  \n[pypeln](https:\u002F\u002Fgithub.com\u002Fcgarciae\u002Fpypeln) - Concurrent data pipelines.  \n[feature-engine](https:\u002F\u002Fgithub.com\u002Ffeature-engine\u002Ffeature_engine) - Encoders, transformers, etc.  \n\n#### Feature Selection\n[Overview Paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS016794731930194X), [Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=JsArBz46_3s), [Repo](https:\u002F\u002Fgithub.com\u002FYimeng-Zhang\u002Ffeature-engineering-and-feature-selection)    \nBlog post series - [1](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-i-univariate-selection\u002F), [2](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-ii-linear-models-and-regularization\u002F), [3](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-iii-random-forests\u002F), [4](http:\u002F\u002Fblog.datadive.net\u002Fselecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side\u002F)  \nTutorials - [1](https:\u002F\u002Fwww.kaggle.com\u002Fresidentmario\u002Fautomated-feature-selection-with-sklearn), [2](https:\u002F\u002Fmachinelearningmastery.com\u002Ffeature-selection-machine-learning-python\u002F)  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.feature_selection) - Feature selection.  \n[eli5](https:\u002F\u002Feli5.readthedocs.io\u002Fen\u002Flatest\u002Fblackbox\u002Fpermutation_importance.html#feature-selection) - Feature selection using permutation importance.  \n[scikit-feature](https:\u002F\u002Fgithub.com\u002Fjundongl\u002Fscikit-feature) - Feature selection algorithms.  \n[stability-selection](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fstability-selection) - Stability selection.  \n[scikit-rebate](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Fscikit-rebate) - Relief-based feature selection algorithms.  \n[scikit-genetic](https:\u002F\u002Fgithub.com\u002Fmanuel-calzolari\u002Fsklearn-genetic) - Genetic feature selection.  \n[boruta_py](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fboruta_py) - Feature selection, [explaination](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F264360\u002Fboruta-all-relevant-feature-selection-vs-random-forest-variables-of-importanc\u002F264467), [example](https:\u002F\u002Fwww.kaggle.com\u002Ftilii7\u002Fboruta-feature-elimination).  \n[Boruta-Shap](https:\u002F\u002Fgithub.com\u002FEkeany\u002FBoruta-Shap) - Boruta feature selection algorithm + shapley values.  \n[linselect](https:\u002F\u002Fgithub.com\u002Fefavdb\u002Flinselect) - Feature selection package.  \n[mlxtend](https:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Ffeature_selection\u002FExhaustiveFeatureSelector\u002F) - Exhaustive feature selection.     \n[BoostARoota](https:\u002F\u002Fgithub.com\u002Fchasedehan\u002FBoostARoota) - Xgboost feature selection algorithm.  \n[INVASE](https:\u002F\u002Fgithub.com\u002Fjsyoon0823\u002FINVASE) - Instance-wise Variable Selection using Neural Networks.  \n[SubTab](https:\u002F\u002Fgithub.com\u002FAstraZeneca\u002FSubTab) - Subsetting Features of Tabular Data for Self-Supervised Representation Learning, AstraZeneca.  \n[mrmr](https:\u002F\u002Fgithub.com\u002Fsmazzanti\u002Fmrmr) - Maximum Relevance and Minimum Redundancy Feature Selection, [Website](http:\u002F\u002Fhome.penglab.com\u002Fproj\u002FmRMR\u002F).  \n[arfs](https:\u002F\u002Fgithub.com\u002FThomasBury\u002Farfs) - All Relevant Feature Selection.  \n[VSURF](https:\u002F\u002Fgithub.com\u002Frobingenuer\u002FVSURF) - Variable Selection Using Random Forests (R package) [doc](https:\u002F\u002Fwww.rdocumentation.org\u002Fpackages\u002FVSURF\u002Fversions\u002F1.1.0\u002Ftopics\u002FVSURF).  \n[FeatureSelectionGA](https:\u002F\u002Fgithub.com\u002Fkaushalshetty\u002FFeatureSelectionGA) - Feature Selection using Genetic Algorithm.  \n\n#### Subset Selection\n[apricot](https:\u002F\u002Fgithub.com\u002Fjmschrei\u002Fapricot) - Selecting subsets of data sets to train machine learning models quickly.  \n[ducks](https:\u002F\u002Fgithub.com\u002Fmanimino\u002Fducks) - Index data for fast lookup by any combination of fields.  \n\n#### Dimensionality Reduction \u002F Representation Learning\n\n##### Selection\nCheck also the Clustering section and self-supervised learning section for ideas!  \n[Review](https:\u002F\u002Fmembers.loria.fr\u002Fmoberger\u002FEnseignement\u002FAVR\u002FExposes\u002FTR_Dimensiereductie.pdf)  \n  \nPCA - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.decomposition.PCA.html)    \nAutoencoder - [link](https:\u002F\u002Fblog.keras.io\u002Fbuilding-autoencoders-in-keras.html)  \nIsomaps - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.Isomap.html#sklearn.manifold.Isomap)    \nLLE - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.LocallyLinearEmbedding.html)  \nForce-directed graph drawing - [link](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.draw_graph.html#scanpy.tl.draw_graph)    \nMDS - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.MDS.html)  \nDiffusion Maps - [link](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.diffmap.html)  \nt-SNE - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.manifold.TSNE.html#sklearn.manifold.TSNE)    \nNeRV - [link](https:\u002F\u002Fgithub.com\u002Fziyuang\u002Fpynerv), [paper](https:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume11\u002Fvenna10a\u002Fvenna10a.pdf)  \nMDR - [link](https:\u002F\u002Fgithub.com\u002FEpistasisLab\u002Fscikit-mdr)  \nUMAP - [link](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fumap)  \nRandom Projection - [link](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Frandom_projection.html)  \nIvis - [link](https:\u002F\u002Fgithub.com\u002Fberingresearch\u002Fivis)   \nSimCLR - [link](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly)  \npymde - Minimum-distortion embedding with PyTorch, [link](https:\u002F\u002Fgithub.com\u002Fcvxgrp\u002Fpymde)\n\n##### Neural-network based\n[esvit](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fesvit) - Vision Transformers for Representation Learning (Microsoft).  \n[MCML](https:\u002F\u002Fgithub.com\u002Fpachterlab\u002FMCML) - Semi-supervised dimensionality reduction of Multi-Class, Multi-Label data (sequencing data) [paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.08.25.457696v1).  \n\n##### Packages\n[Dangers of PCA (paper)](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41598-022-14395-4).  \n[Phantom oscillations in PCA](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.06.20.545619v1.full).  \n[What to use instead of PCA](https:\u002F\u002Fwww.pnas.org\u002Fdoi\u002F10.1073\u002Fpnas.2319169120).  \n[Talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=9iol3Lk6kyU), [tsne intro](https:\u002F\u002Fdistill.pub\u002F2016\u002Fmisread-tsne\u002F). \n[sklearn.manifold](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.manifold) and [sklearn.decomposition](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fclasses.html#module-sklearn.decomposition) - PCA, t-SNE, MDS, Isomaps and others.  \nAdditional plots for PCA - Factor Loadings, Cumulative Variance Explained, [Correlation Circle Plot](http:\u002F\u002Frasbt.github.io\u002Fmlxtend\u002Fuser_guide\u002Fplotting\u002Fplot_pca_correlation_graph\u002F), [Tweet](https:\u002F\u002Ftwitter.com\u002Frasbt\u002Fstatus\u002F1555999903398219777\u002Fphoto\u002F1)  \n[sklearn.random_projection](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Frandom_projection.html) - Johnson-Lindenstrauss lemma, Gaussian random projection, Sparse random projection.  \n[sklearn.cross_decomposition](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fcross_decomposition.html#cross-decomposition) - Partial least squares, supervised estimators for dimensionality reduction and regression.  \n[prince](https:\u002F\u002Fgithub.com\u002FMaxHalford\u002Fprince) - Dimensionality reduction, factor analysis (PCA, MCA, CA, FAMD).  \nFaster t-SNE implementations: [tsne-cuda](https:\u002F\u002Fgithub.com\u002FCannyLab\u002Ftsne-cuda), [MulticoreTSNE](https:\u002F\u002Fgithub.com\u002FDmitryUlyanov\u002FMulticore-TSNE), [lvdmaaten](https:\u002F\u002Flvdmaaten.github.io\u002Ftsne\u002F)  \n[umap](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fumap) - Uniform Manifold Approximation and Projection, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=nq6iPZVUxZU), [explorer](https:\u002F\u002Fgithub.com\u002FGrantCuster\u002Fumap-explorer), [explanation](https:\u002F\u002Fpair-code.github.io\u002Funderstanding-umap\u002F), [parallel version](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fcuml\u002Fstable\u002Fapi.html).  \n[humap](https:\u002F\u002Fgithub.com\u002Fwilsonjr\u002Fhumap) - Hierarchical UMAP.  \n[sleepwalk](https:\u002F\u002Fgithub.com\u002Fanders-biostat\u002Fsleepwalk\u002F) - Explore embeddings, interactive visualization (R package).  \n[somoclu](https:\u002F\u002Fgithub.com\u002Fpeterwittek\u002Fsomoclu) - Self-organizing map.  \n[scikit-tda](https:\u002F\u002Fgithub.com\u002Fscikit-tda\u002Fscikit-tda) - Topological Data Analysis, [paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fsrep01236), [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=F2t_ytTLrQ4), [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=AWoeBzJd7uQ), [paper](https:\u002F\u002Fwww.uncg.edu\u002Fmat\u002Ffaculty\u002Fcdsmyth\u002Ftopological-approaches-skin.pdf).  \n[giotto-tda](https:\u002F\u002Fgithub.com\u002Fgiotto-ai\u002Fgiotto-tda) - Topological Data Analysis.  \n[ivis](https:\u002F\u002Fgithub.com\u002Fberingresearch\u002Fivis) - Dimensionality reduction using Siamese Networks.  \n[trimap](https:\u002F\u002Fgithub.com\u002Feamid\u002Ftrimap) - Dimensionality reduction using triplets.  \n[scanpy](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscanpy) - [Force-directed graph drawing](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.draw_graph.html#scanpy.tl.draw_graph), [Diffusion Maps](https:\u002F\u002Fscanpy.readthedocs.io\u002Fen\u002Fstable\u002Fapi\u002Fscanpy.tl.diffmap.html).  \n[direpack](https:\u002F\u002Fgithub.com\u002FSvenSerneels\u002Fdirepack) - Projection pursuit, Sufficient dimension reduction, Robust M-estimators.  \n[DBS](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002FDatabionicSwarm\u002Fvignettes\u002FDatabionicSwarm.html) - DatabionicSwarm (R package).  \n[contrastive](https:\u002F\u002Fgithub.com\u002Fabidlabs\u002Fcontrastive) - Contrastive PCA.  \n[scPCA](https:\u002F\u002Fgithub.com\u002FPhilBoileau\u002FscPCA) - Sparse contrastive PCA (R package).  \n[generalized_contrastive_PCA](https:\u002F\u002Fgithub.com\u002FSjulsonLab\u002Fgeneralized_contrastive_PCA) - Generalized contrastive PCA.  \n[tmap](https:\u002F\u002Fgithub.com\u002Freymond-group\u002Ftmap) - Visualization library for large, high-dimensional data sets.  \n[lollipop](https:\u002F\u002Fgithub.com\u002Fneurodata\u002Flollipop) - Linear Optimal Low Rank Projection.  \n[linearsdr](https:\u002F\u002Fgithub.com\u002FHarrisQ\u002Flinearsdr) - Linear Sufficient Dimension Reduction (R package).  \n[PHATE](https:\u002F\u002Fgithub.com\u002FKrishnaswamyLab\u002FPHATE) - Tool for visualizing high dimensional data.  \n[datamapplot](https:\u002F\u002Fgithub.com\u002FTutteInstitute\u002Fdatamapplot) - Tool for visualizing high dimensional data.  \n\n#### Visualization\n[All charts](https:\u002F\u002Fdatavizproject.com\u002F)  \n[physt](https:\u002F\u002Fgithub.com\u002Fjanpipek\u002Fphyst) - Better histograms, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZG-wH3-Up9Y), [notebook](https:\u002F\u002Fnbviewer.jupyter.org\u002Fgithub\u002Fjanpipek\u002Fpydata2018-berlin\u002Fblob\u002Fmaster\u002Fnotebooks\u002Ftalk.ipynb).  \n[fast-histogram](https:\u002F\u002Fgithub.com\u002Fastrofrog\u002Ffast-histogram) - Fast histograms.  \n[matplotlib_venn](https:\u002F\u002Fgithub.com\u002Fkonstantint\u002Fmatplotlib-venn) - Venn diagrams.  \n[penrose](https:\u002F\u002Fgithub.com\u002Fpenrose\u002Fpenrose) - Venn diagrams.  \n[ridgeplot](https:\u002F\u002Fgithub.com\u002Ftpvasconcelos\u002Fridgeplot) - Ridge plots.  \n[mosaic plots](https:\u002F\u002Fwww.statsmodels.org\u002Fdev\u002Fgenerated\u002Fstatsmodels.graphics.mosaicplot.mosaic.html) - Categorical variable visualization, [example](https:\u002F\u002Fsukhbinder.wordpress.com\u002F2018\u002F09\u002F18\u002Fmosaic-plot-in-python\u002F).  \n[yellowbrick](https:\u002F\u002Fgithub.com\u002FDistrictDataLabs\u002Fyellowbrick) - Visualizations for ML models (similar to scikit-plot).  \n[bokeh](https:\u002F\u002Fgithub.com\u002Fbokeh\u002Fbokeh) - Interactive visualization library, [Examples](https:\u002F\u002Fbokeh.pydata.org\u002Fen\u002Flatest\u002Fdocs\u002Fuser_guide\u002Fserver.html), [Examples](https:\u002F\u002Fgithub.com\u002FWillKoehrsen\u002FBokeh-Python-Visualization).  \n[lets-plot](https:\u002F\u002Fgithub.com\u002FJetBrains\u002Flets-plot) - Plotting library.  \n[plotnine](https:\u002F\u002Fgithub.com\u002Fhas2k1\u002Fplotnine) - ggplot for Python.  \n[altair](https:\u002F\u002Fgithub.com\u002Fvega\u002Faltair) - Declarative statistical visualization library.  \n[hvplot](https:\u002F\u002Fgithub.com\u002Fpyviz\u002Fhvplot) - High-level plotting library built on top of [holoviews](http:\u002F\u002Fholoviews.org\u002F).  \n[dtreeviz](https:\u002F\u002Fgithub.com\u002Fparrt\u002Fdtreeviz) - Decision tree visualization and model interpretation.  \n[mpl-scatter-density](https:\u002F\u002Fgithub.com\u002Fastrofrog\u002Fmpl-scatter-density) - Scatter density plots. Alternative to 2d-histograms.   \n[ComplexHeatmap](https:\u002F\u002Fgithub.com\u002Fjokergoo\u002FComplexHeatmap) - Complex heatmaps for multidimensional genomic data (R package).  \n[morpheus](https:\u002F\u002Fsoftware.broadinstitute.org\u002Fmorpheus\u002F) - Broad Institute tool matrix visualization and analysis software. [Source](https:\u002F\u002Fgithub.com\u002Fcmap\u002Fmorpheus.js), Tutorial: [1](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0nkYDeekhtQ), [2](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=r9mN6MsxUb0), [Code](https:\u002F\u002Fgithub.com\u002Fbroadinstitute\u002FBBBC021_Morpheus_Exercise).  \n[jupyter-scatter](https:\u002F\u002Fgithub.com\u002Fflekschas\u002Fjupyter-scatter) - Interactive 2D scatter plot widget for Jupyter.  \n[fastplotlib](https:\u002F\u002Fgithub.com\u002Ffastplotlib\u002Ffastplotlib) - Fast plotting library using pygfx.  \n[datamapplot](https:\u002F\u002Fgithub.com\u002FTutteInstitute\u002Fdatamapplot) - Interactive 2D scatter plot.  \n[SandDance](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSandDance) - Interactive visualization tool from Microsoft.  \n\n#### Colors\n[palettable](https:\u002F\u002Fgithub.com\u002Fjiffyclub\u002Fpalettable) - Color palettes from [colorbrewer2](https:\u002F\u002Fcolorbrewer2.org\u002F#type=sequential&scheme=BuGn&n=3).  \n[colorcet](https:\u002F\u002Fgithub.com\u002Fholoviz\u002Fcolorcet) - Collection of perceptually uniform colormaps.  \n[Named Colors Wheel](https:\u002F\u002Farantius.github.io\u002Fweb-color-wheel\u002F) - Color wheel for all named HTML colors.  \n\n#### Dashboards\n[py-shiny](https:\u002F\u002Fgithub.com\u002Frstudio\u002Fpy-shiny) - Shiny for Python, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ijRBbtT2tgc).  \n[superset](https:\u002F\u002Fgithub.com\u002Fapache\u002Fsuperset) - Dashboarding solution by Apache.  \n[streamlit](https:\u002F\u002Fgithub.com\u002Fstreamlit\u002Fstreamlit) - Dashboarding solution. [Resources](https:\u002F\u002Fgithub.com\u002Fmarcskovmadsen\u002Fawesome-streamlit), [Gallery](http:\u002F\u002Fawesome-streamlit.org\u002F) [Components](https:\u002F\u002Fwww.streamlit.io\u002Fcomponents), [bokeh-events](https:\u002F\u002Fgithub.com\u002Fash2shukla\u002Fstreamlit-bokeh-events).  \n[mercury](https:\u002F\u002Fgithub.com\u002Fmljar\u002Fmercury) - Convert Python notebook to web app, [Example](https:\u002F\u002Fgithub.com\u002Fpplonski\u002Fdashboard-python-jupyter-notebook).  \n[dash](https:\u002F\u002Fdash.plot.ly\u002Fgallery) - Dashboarding solution by plot.ly. [Resources](https:\u002F\u002Fgithub.com\u002Fucg8j\u002Fawesome-dash).  \n[visdom](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvisdom) - Dashboarding library by Facebook.  \n[panel](https:\u002F\u002Fpanel.pyviz.org\u002Findex.html) - Dashboarding solution.  \n[altair example](https:\u002F\u002Fgithub.com\u002Fxhochy\u002Faltair-vue-vega-example) - [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=4L568emKOvs).  \n[voila](https:\u002F\u002Fgithub.com\u002FQuantStack\u002Fvoila) - Turn Jupyter notebooks into standalone web applications.  \n[voila-gridstack](https:\u002F\u002Fgithub.com\u002Fvoila-dashboards\u002Fvoila-gridstack) - Voila grid layout.  \n\n#### UI\n[gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio) - Create UIs for your machine learning model.  \n\n#### Survey Tools\n[samplics](https:\u002F\u002Fgithub.com\u002Fsamplics-org\u002Fsamplics) - Sampling techniques for complex survey designs.  \n\n#### Geographical Tools\n[folium](https:\u002F\u002Fgithub.com\u002Fpython-visualization\u002Ffolium) - Plot geographical maps using the Leaflet.js library, [jupyter plugin](https:\u002F\u002Fgithub.com\u002Fjupyter-widgets\u002Fipyleaflet).  \n[gmaps](https:\u002F\u002Fgithub.com\u002Fpbugnion\u002Fgmaps) - Google Maps for Jupyter notebooks.  \n[stadiamaps](https:\u002F\u002Fstadiamaps.com\u002F) - Plot geographical maps.  \n[datashader](https:\u002F\u002Fgithub.com\u002Fbokeh\u002Fdatashader) - Draw millions of points on a map.  \n[sklearn](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.neighbors.BallTree.html) - BallTree.  \n[pynndescent](https:\u002F\u002Fgithub.com\u002Flmcinnes\u002Fpynndescent) - Nearest neighbor descent for approximate nearest neighbors.  \n[geocoder](https:\u002F\u002Fgithub.com\u002FDenisCarriere\u002Fgeocoder) - Geocoding of addresses, IP addresses.  \nConversion of different geo formats: [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=eHRggqAvczE), [repo](https:\u002F\u002Fgithub.com\u002Fdillongardner\u002FPyDataSpatialAnalysis)  \n[geopandas](https:\u002F\u002Fgithub.com\u002Fgeopandas\u002Fgeopandas) - Tools for geographic data  \nLow Level Geospatial Tools (GEOS, GDAL\u002FOGR, PROJ.4)  \nVector Data (Shapely, Fiona, Pyproj)  \nRaster Data (Rasterio)  \nPlotting (Descartes, Catropy)  \n[Predict economic indicators from Open Street Map](https:\u002F\u002Fjanakiev.com\u002Fblog\u002Fosm-predict-economic-indicators\u002F).   \n[PySal](https:\u002F\u002Fgithub.com\u002Fpysal\u002Fpysal) - Python Spatial Analysis Library.  \n[geography](https:\u002F\u002Fgithub.com\u002Fushahidi\u002Fgeograpy) - Extract countries, regions and cities from a URL or text.  \n[cartogram](https:\u002F\u002Fgo-cart.io\u002Fcartogram) - Distorted maps based on population.  \n\n#### Recommender Systems\nExamples: [1](https:\u002F\u002Flazyprogrammer.me\u002Ftutorial-on-collaborative-filtering-and-matrix-factorization-in-python\u002F), [2](https:\u002F\u002Fmedium.com\u002F@james_aka_yale\u002Fthe-4-recommendation-engines-that-can-predict-your-movie-tastes-bbec857b8223), [2-ipynb](https:\u002F\u002Fgithub.com\u002Fkhanhnamle1994\u002Fmovielens\u002Fblob\u002Fmaster\u002FContent_Based_and_Collaborative_Filtering_Models.ipynb), [3](https:\u002F\u002Fwww.kaggle.com\u002Fmorrisb\u002Fhow-to-recommend-anything-deep-recommender).  \n[surprise](https:\u002F\u002Fgithub.com\u002FNicolasHug\u002FSurprise) - Recommender, [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=d7iIb_XVkZs).  \n[implicit](https:\u002F\u002Fgithub.com\u002Fbenfred\u002Fimplicit) - Fast Collaborative Filtering for Implicit Feedback Datasets.  \n[spotlight](https:\u002F\u002Fgithub.com\u002Fmaciejkula\u002Fspotlight) - Deep recommender models using PyTorch.  \n[lightfm](https:\u002F\u002Fgithub.com\u002Flyst\u002Flightfm) - Recommendation algorithms for both implicit and explicit feedback.  \n[funk-svd](https:\u002F\u002Fgithub.com\u002Fgbolmier\u002Ffunk-svd) - Fast SVD.  \n\n#### Decision Tree Models\n[Intro to Decision Trees and Random Forests](https:\u002F\u002Fvictorzhou.com\u002Fblog\u002Fintro-to-random-forests\u002F), [Another good visualization](https:\u002F\u002Fmlu-explain.github.io\u002Fdecision-tree\u002F), Intro to Gradient Boosting [1](https:\u002F\u002Fexplained.ai\u002Fgradient-boosting\u002F), [2](https:\u002F\u002Fwww.gormanalysis.com\u002Fblog\u002Fgradient-boosting-explained\u002F), [Decision Tree Visualization](https:\u002F\u002Fexplained.ai\u002Fdecision-tree-viz\u002Findex.html)    \n[lightgbm](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FLightGBM) - Gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, [doc](https:\u002F\u002Fsites.google.com\u002Fview\u002Flauraepp\u002Fparameters).  \n[xgboost](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fxgboost) - Gradient boosting (GBDT, GBRT or GBM) library, [doc](https:\u002F\u002Fsites.google.com\u002Fview\u002Flauraepp\u002Fparameters), Methods for CIs: [link1](https:\u002F\u002Fstats.stackexchange.com\u002Fquestions\u002F255783\u002Fconfidence-interval-for-xgb-forecast), [link2](https:\u002F\u002Ftowardsdatascience.com\u002Fregression-prediction-intervals-with-xgboost-428e0a018b).  \n[catboost](https:\u002F\u002Fgithub.com\u002Fcatboost\u002Fcatboost) - Gradient boosting.  \n[h2o](https:\u002F\u002Fgithub.com\u002Fh2oai\u002Fh2o-3) -  Gradient boosting and general machine learning framework.  \n[pycaret](https:\u002F\u002Fgithub.com\u002Fpycaret\u002Fpycaret) - Wrapper for xgboost, lightgbm, catboost etc.  \n[forestci](https:\u002F\u002Fgithub.com\u002Fscikit-learn-contrib\u002Fforest-confidence-interval) - Confidence intervals for random forests.  \n[grf](https:\u002F\u002Fgithub.com\u002Fgrf-labs\u002Fgrf) - Generalized random forest.  \n[dtreeviz](https:\u002F\u002Fgithub.com\u002Fparrt\u002Fdtreeviz) - Decision tree visualization and model interpretation.  \n[Nuance](https:\u002F\u002Fgithub.com\u002FSauceCat\u002FNuance) - Decision tree visualization.  \n[rfpimp](https:\u002F\u002Fgithub.com\u002Fparrt\u002Frandom-forest-importances) - Feature Importance for RandomForests using Permuation Importance.  \nWhy the default feature importance for random forests is wrong: [link](http:\u002F\u002Fexplained.ai\u002Frf-importance\u002Findex.html)  \n[bartpy](https:\u002F\u002Fgithub.com\u002FJakeColtman\u002Fbartpy) - Bayesian Additive Regression Trees.  \n[merf](https:\u002F\u002Fgithub.com\u002Fmanifoldai\u002Fmerf) - Mixed Effects Random Forest for Clustering, [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=gWj4ZwB7f3o)  \n[groot](https:\u002F\u002Fgithub.com\u002Ftudelft-cda-lab\u002FGROOT) - Robust decision trees.  \n[linear-tree](https:\u002F\u002Fgithub.com\u002Fcerlymarco\u002Flinear-tree) - Trees with linear models at the leaves.  \n[supertree](https:\u002F\u002Fgithub.com\u002Fmljar\u002Fsupertree) - Decision tree visualization.  \n\n#### Natural Language Processing (NLP) \u002F Text Processing\n[talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6zm9NC9uRkk)-[nb](https:\u002F\u002Fnbviewer.jupyter.org\u002Fgithub\u002Fskipgram\u002Fmodern-nlp-in-python\u002Fblob\u002Fmaster\u002Fexecutable\u002FModern_NLP_in_Python.ipynb), [nb2](https:\u002F\u002Fahmedbesbes.com\u002Fhow-to-mine-newsfeed-data-and-extract-interactive-insights-in-python.html), [talk](https:\u002F\u002Fwww.youtube.com\u002Fwatch?time_continue=2&v=sI7VpFNiy_I).  \n[Text classification Intro](https:\u002F\u002Fmlwhiz.com\u002Fblog\u002F2018\u002F12\u002F17\u002Ftext_classification\u002F), [Preprocessing blog post](https:\u002F\u002Fmlwhiz.com\u002Fblog\u002F2019\u002F01\u002F17\u002Fdeeplearning_nlp_preprocess\u002F).  \n[gensim](https:\u002F\u002Fradimrehurek.com\u002Fgensim\u002F) - NLP, doc2vec, word2vec, text processing, topic modelling (LSA, LDA), [Example](https:\u002F\u002Fmarkroxor.github.io\u002Fgensim\u002Fstatic\u002Fnotebooks\u002Fgensim_news_classification.html), [Coherence Model](https:\u002F\u002Fradimrehurek.com\u002Fgensim\u002Fmodels\u002Fcoherencemodel.html) for evaluation.  \nEmbeddings - [GloVe](https:\u002F\u002Fnlp.stanford.edu\u002Fprojects\u002Fglove\u002F) ([[1](https:\u002F\u002Fwww.kaggle.com\u002Fjhoward\u002Fimproved-lstm-baseline-glove-dropout)], [[2](https:\u002F\u002Fwww.kaggle.com\u002Fsbongo\u002Fdo-pretrained-embeddings-give-you-the-extra-edge)]), [StarSpace](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FStarSpace), [wikipedia2vec](https:\u002F\u002Fwikipedia2vec.github.io\u002Fwikipedia2vec\u002Fpretrained\u002F), [visualization](https:\u002F\u002Fprojector.tensorflow.org\u002F).  \n[magnitude](https:\u002F\u002Fgithub.com\u002Fplasticityai\u002Fmagnitude) - Vector embedding utility package.  \n[pyldavis](https:\u002F\u002Fgithub.com\u002Fbmabey\u002FpyLDAvis) - Visualization for topic modelling.  \n[spaCy](https:\u002F\u002Fspacy.io\u002F) - NLP.  \n[NTLK](https:\u002F\u002Fwww.nltk.org\u002F) - NLP, helpful `KMeansClusterer` with `cosine_distance`.  \n[pytext](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FPyText) - NLP from Facebook.  \n[fastText](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FfastText) - Efficient text classification and representation learning.  \n[annoy](https:\u002F\u002Fgithub.com\u002Fspotify\u002Fannoy) - Approximate nearest neighbor search.  \n[faiss](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffaiss) - Approximate nearest neighbor search.  \n[infomap](https:\u002F\u002Fgithub.com\u002Fmapequation\u002Finfomap) - Cluster (word-)vectors to find topics.  \n[datasketch](https:\u002F\u002Fgithub.com\u002Fekzhu\u002Fdatasketch) - Probabilistic data structures for large data (MinHash, HyperLogLog).  \n[flair](https:\u002F\u002Fgithub.com\u002Fzalandoresearch\u002Fflair) - NLP Framework by Zalando.  \n[stanza](https:\u002F\u002Fgithub.com\u002Fstanfordnlp\u002Fstanza) - NLP Library.  \n[Chatistics](https:\u002F\u002Fgithub.com\u002FMasterScrat\u002FChatistics) - Turn Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.  \n[textdistance](https:\u002F\u002Fgithub.com\u002Flife4\u002Ftextdistance) - Collection for comparing distances between two or more sequences.  \n\n#### Bio Image Analysis\n[Lee et al. - A beginner's guide to rigor and reproducibility in fluorescence imaging experiments](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC6080651\u002F)  \n[Awesome Cytodata](https:\u002F\u002Fgithub.com\u002Fcytodata\u002Fawesome-cytodata)  \n\n##### Tutorials\n[MIT 7.016 Introductory Biology, Fall 2018](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLUl4u3cNGP63LmSVIVzy584-ZbjbJ-Y63) - Videos 27, 28, and 29 talk about staining and imaging.  \n[Bio-image Analysis Notebooks](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002Fintro.html) - Large collection of image processing workflows, including [point-spread-function estimation](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F18a_deconvolution\u002Fextract_psf.html) and [deconvolution](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F18a_deconvolution\u002Fintroduction_deconvolution.html), [3D cell segmentation](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F20_image_segmentation\u002FSegmentation_3D.html), [feature extraction](https:\u002F\u002Fhaesleinhuepf.github.io\u002FBioImageAnalysisNotebooks\u002F22_feature_extraction\u002Fstatistics_with_pyclesperanto.html) using [pyclesperanto](https:\u002F\u002Fgithub.com\u002FclEsperanto\u002Fpyclesperanto_prototype) and others.  \n[python_for_microscopists](https:\u002F\u002Fgithub.com\u002Fbnsreenu\u002Fpython_for_microscopists) - Notebooks and associated [youtube channel](https:\u002F\u002Fwww.youtube.com\u002Fchannel\u002FUC34rW-HtPJulxr5wp2Xa04w\u002Fvideos) for a variety of image processing tasks.  \n\n##### Datasets\n[jump-cellpainting](https:\u002F\u002Fgithub.com\u002Fjump-cellpainting\u002Fdatasets) - Cellpainting dataset.  \n[MedMNIST](https:\u002F\u002Fgithub.com\u002FMedMNIST\u002FMedMNIST) - Datasets for 2D and 3D Biomedical Image Classification.  \n[CytoImageNet](https:\u002F\u002Fgithub.com\u002Fstan-hua\u002FCytoImageNet) - Huge diverse dataset like ImageNet but for cell images.  \n[Haghighi](https:\u002F\u002Fgithub.com\u002Fcarpenterlab\u002F2021_Haghighi_NatureMethods) - Gene Expression and Morphology Profiles.  \n[broadinstitute\u002Flincs-profiling-complementarity](https:\u002F\u002Fgithub.com\u002Fbroadinstitute\u002Flincs-profiling-complementarity) - Cellpainting vs. L1000 assay.  \n\n#### Biostatistics \u002F Robust statistics\n[MinCovDet](https:\u002F\u002Fscikit-learn.org\u002Fstable\u002Fmodules\u002Fgenerated\u002Fsklearn.covariance.MinCovDet.html) - Robust estimator of covariance, RMPV, [Paper](https:\u002F\u002Fwires.onlinelibrary.wiley.com\u002Fdoi\u002Ffull\u002F10.1002\u002Fwics.1421), [App1](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002F10.1177\u002F1087057112469257?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed&), [App2](https:\u002F\u002Fwww.cell.com\u002Fcell-reports\u002Fpdf\u002FS2211-1247(21)00694-X.pdf).  \n[moderated z-score](https:\u002F\u002Fclue.io\u002Fconnectopedia\u002Freplicate_collapse) - Weighted average of z-scores based on Spearman correlation.  \n[winsorize](https:\u002F\u002Fdocs.scipy.org\u002Fdoc\u002Fscipy\u002Freference\u002Fgenerated\u002Fscipy.stats.mstats.winsorize.html#scipy.stats.mstats.winsorize) - Simple adjustment of outliers.  \n\n#### High-Content Screening Assay Design\n[Zhang XHD (2008) - Novel analytic criteria and effective plate designs for quality control in genome-wide RNAi screens](https:\u002F\u002Fslas-discovery.org\u002Farticle\u002FS2472-5552(22)08204-1\u002Fpdf)  \n[Iversen - A Comparison of Assay Performance Measures in Screening Assays, Signal Window, Z′ Factor, and Assay Variability Ratio](https:\u002F\u002Fwww.slas-discovery.org\u002Farticle\u002FS2472-5552(22)08460-X\u002Fpdf)\n[Z-factor](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FZ-factor) - Measure of statistical effect size.  \n[Z'-factor](https:\u002F\u002Flink.springer.com\u002Freferenceworkentry\u002F10.1007\u002F978-3-540-47648-1_6298) - Measure of statistical effect size.  \n[CV](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FCoefficient_of_variation) - Coefficient of variation.  \n[SSMD](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FStrictly_standardized_mean_difference) - Strictly standardized mean difference.  \n[Signal Window](https:\u002F\u002Fwww.intechopen.com\u002Fchapters\u002F48130) - Assay quality measurement.  \n\n#### Microscopy + Assay\n[BD Spectrum Viewer](https:\u002F\u002Fwww.bdbiosciences.com\u002Fen-us\u002Fresources\u002Fbd-spectrum-viewer) - Calculate spectral overlap, bleed through for fluorescence microscopy dyes.  \n[SpectraViewer](https:\u002F\u002Fwww.perkinelmer.com\u002Flab-products-and-services\u002Fspectraviewer) - Visualize the spectral compatibility of fluorophores (PerkinElmer).  \n[Thermofisher Spectrum Viewer](https:\u002F\u002Fwww.thermofisher.com\u002Forder\u002Fstain-it) - Thermofisher Spectrum Viewer.  \n[Microscopy Resolution Calculator](https:\u002F\u002Fwww.microscope.healthcare.nikon.com\u002Fmicrotools\u002Fresolution-calculator) - Calculate resolution of images (Nikon).  \n[PlateEditor](https:\u002F\u002Fgithub.com\u002Fvindelorme\u002FPlateEditor) - Drug Layout for plates, [app](https:\u002F\u002Fplateeditor.sourceforge.io\u002F), [zip](https:\u002F\u002Fsourceforge.net\u002Fprojects\u002Fplateeditor\u002F), [paper](https:\u002F\u002Fjournals.plos.org\u002Fplosone\u002Farticle?id=10.1371\u002Fjournal.pone.0252488).  \n\n##### Image Formats and Converters\nOME-Zarr - [paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.02.17.528834v1.full), [standard](https:\u002F\u002Fngff.openmicroscopy.org\u002Flatest\u002F)  \n[bioformats2raw](https:\u002F\u002Fgithub.com\u002Fglencoesoftware\u002Fbioformats2raw) - Various formats to zarr.  \n[raw2ometiff](https:\u002F\u002Fgithub.com\u002Fglencoesoftware\u002Fraw2ometiff) - Zarr to tiff.  \n[BatchConvert](https:\u002F\u002Fgithub.com\u002FEuro-BioImaging\u002FBatchConvert) - Wrapper for bioformats2raw to parallelize conversions with nextflow, [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DeCWV274l0c).  \nREMBI model - Recommended Metadata for Biological Images, BioImage Archive: [Study Component Guidance](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbioimage-archive\u002Frembi-help-examples\u002F), [File List Guide](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbioimage-archive\u002Fhelp-file-list\u002F), [paper](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC8606015\u002F), [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GVmfOpuP2_c), [spreadsheet](https:\u002F\u002Fdocs.google.com\u002Fspreadsheets\u002Fd\u002F1Ck1NeLp-ZN4eMGdNYo2nV6KLEdSfN6oQBKnnWU6Npeo\u002Fedit#gid=1023506919)  \n\n##### Matrix Formats\n[anndata](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fanndata) - annotated data matrices in memory and on disk, [Docs](https:\u002F\u002Fanndata.readthedocs.io\u002Fen\u002Flatest\u002Findex.html).  \n[muon](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fmuon) - Multimodal omics framework.  \n[mudata](https:\u002F\u002Fgithub.com\u002Fscverse\u002Fmudata) - Multimodal Data (.h5mu) implementation.  \n[bdz](https:\u002F\u002Fgithub.com\u002Fopenssbd\u002Fbdz) - Zarr-based format for storing quantitative biological dynamics data.  \n\n#### Image Viewers\n[napari](https:\u002F\u002Fgithub.com\u002Fnapari\u002Fnapari) - Image viewer and image processing tool.    \n[Fiji](https:\u002F\u002Ffiji.sc\u002F) - General purpose tool. Image viewer and image processing tool.  \n[vizarr](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002Fvizarr) - Browser-based image viewer for zarr format.  \n[avivator](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002Fviv) - Browser-based image viewer for tiff files.  \n[OMERO](https:\u002F\u002Fwww.openmicroscopy.org\u002Fomero\u002F) - Image viewer for high-content screening. [IDR](https:\u002F\u002Fidr.openmicroscopy.org\u002F) uses OMERO. [Intro](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=nSCrMO_c-5s)   \n[fiftyone](https:\u002F\u002Fgithub.com\u002Fvoxel51\u002Ffiftyone) - Viewer and tool for building high-quality datasets and computer vision models.  \nImage Data Explorer - Microscopy Image Viewer, [Shiny App](https:\u002F\u002Fshiny-portal.embl.de\u002Fshinyapps\u002Fapp\u002F01_image-data-explorer), [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=H8zIZvOt1MA).  \n[ImSwitch](https:\u002F\u002Fgithub.com\u002FImSwitch\u002FImSwitch) - Microscopy Image Viewer, [Doc](https:\u002F\u002Fimswitch.readthedocs.io\u002Fen\u002Fstable\u002Fgui.html), [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=XsbnMkGSPQQ).  \n[pixmi](https:\u002F\u002Fgithub.com\u002Fpiximi\u002Fpiximi) - Web-based image annotation and classification tool, [App](https:\u002F\u002Fwww.piximi.app\u002F).  \n[DeepCell Label](https:\u002F\u002Flabel.deepcell.org\u002F) - Data labeling tool to segment images, [Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=zfsvUBkEeow).  \n[lightly-studio](https:\u002F\u002Fgithub.com\u002Flightly-ai\u002Flightly-studio) - Image annotation.  \n\n#### Napari Plugins\n[napari-sam](https:\u002F\u002Fgithub.com\u002FMIC-DKFZ\u002Fnapari-sam) - Segment Anything Plugin.  \n[napari-chatgpt](https:\u002F\u002Fgithub.com\u002Froyerlab\u002Fnapari-chatgpt) - ChatGPT Plugin.  \n\n##### Image Restoration and Denoising\n[aydin](https:\u002F\u002Fgithub.com\u002Froyerlab\u002Faydin) - Image denoising.  \n[DivNoising](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FDivNoising) - Unsupervised denoising method.  \n[CSBDeep](https:\u002F\u002Fgithub.com\u002FCSBDeep\u002FCSBDeep) - Content-aware image restoration, [Project page](https:\u002F\u002Fcsbdeep.bioimagecomputing.com\u002Ftools\u002F).  \n[gibbs-diffusion](https:\u002F\u002Fgithub.com\u002Frubenohana\u002Fgibbs-diffusion) - Image denoising.  \n\n##### Illumination correction\n[skimage](https:\u002F\u002Fscikit-image.org\u002Fdocs\u002Fdev\u002Fapi\u002Fskimage.exposure.html#skimage.exposure.equalize_adapthist) - Illumination correction (CLAHE).  \n[cidre](https:\u002F\u002Fgithub.com\u002Fsmithk\u002Fcidre) - Illumination correction method for optical microscopy.  \n[BaSiCPy](https:\u002F\u002Fgithub.com\u002Fpeng-lab\u002FBaSiCPy) - Background and Shading Correction of Optical Microscopy Images, [BaSiC](https:\u002F\u002Fgithub.com\u002Fmarrlab\u002FBaSiC).  \n\n##### Bleedthrough correction \u002F Spectral Unmixing\n[PICASSO](https:\u002F\u002Fgithub.com\u002Fnygctech\u002FPICASSO) - Blind unmixing without reference spectra measurement, [Paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2021.01.27.428247v1.full)  \n[cytoflow](https:\u002F\u002Fgithub.com\u002Fcytoflow\u002Fcytoflow) - Flow cytometry. Includes Bleedthrough correction methods.  \nLinear unmixing in Fiji for Bleedthrough Correction - [Youtube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=W90qs0J29v8).  \nBleedthrough Correction using Lumos and Fiji - [Link](https:\u002F\u002Fimagej.net\u002Fplugins\u002Flumos-spectral-unmixing).  \nAutoUnmix - [Link](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.05.30.542836v1.full).  \n\n##### Platforms and Pipelines\n[CellProfiler](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler), [CellProfilerAnalyst](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler-Analyst) - Create image analysis pipelines.  \n[fractal](https:\u002F\u002Ffractal-analytics-platform.github.io\u002F) - Framework to process high-content imaging data from UZH, [Github](https:\u002F\u002Fgithub.com\u002Ffractal-analytics-platform).  \n[atomai](https:\u002F\u002Fgithub.com\u002Fpycroscopy\u002Fatomai) - Deep and Machine Learning for Microscopy.  \n[py-clesperanto](https:\u002F\u002Fgithub.com\u002Fclesperanto\u002Fpyclesperanto_prototype\u002F) - Tools for 3D microscopy analysis, [deskewing](https:\u002F\u002Fgithub.com\u002FclEsperanto\u002Fpyclesperanto_prototype\u002Fblob\u002Fmaster\u002Fdemo\u002Ftransforms\u002Fdeskew.ipynb) and lots of other tutorials, interacts with napari.  \n[qupath](https:\u002F\u002Fgithub.com\u002Fqupath\u002Fqupath) - Image analysis.  \n\n##### Microscopy Pipelines\nLabsyspharm Stack see below.  \n[BiaPy](https:\u002F\u002Fgithub.com\u002Fdanifranco\u002FBiaPy) - Bioimage analysis pipelines, [paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2024.02.03.576026v2.full).  \n[SCIP](https:\u002F\u002Fscalable-cytometry-image-processing.readthedocs.io\u002Fen\u002Flatest\u002Fusage.html) - Image processing pipeline on top of Dask.  \n[DeepCell Kiosk](https:\u002F\u002Fgithub.com\u002Fvanvalenlab\u002Fkiosk-console\u002Ftree\u002Fmaster) - Image analysis platform.  \n[IMCWorkflow](https:\u002F\u002Fgithub.com\u002FBodenmillerGroup\u002FIMCWorkflow\u002F) - Image analysis pipeline using [steinbock](https:\u002F\u002Fgithub.com\u002FBodenmillerGroup\u002Fsteinbock), [Twitter](https:\u002F\u002Ftwitter.com\u002FNilsEling\u002Fstatus\u002F1715020265963258087), [Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41596-023-00881-0), [workflow](https:\u002F\u002Fbodenmillergroup.github.io\u002FIMCDataAnalysis\u002F).  \n\n##### Labsyspharm\n[mcmicro](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fmcmicro) - Multiple-choice microscopy pipeline, [Website](https:\u002F\u002Fmcmicro.org\u002Foverview\u002F), [Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41592-021-01308-y).  \n[MCQuant](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fquantification) - Quantification of cell features.  \n[cylinter](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fcylinter) - Quality assurance for microscopy images, [Website](https:\u002F\u002Flabsyspharm.github.io\u002Fcylinter\u002F).  \n[ashlar](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fashlar) - Whole-slide microscopy image stitching and registration.  \n[scimap](https:\u002F\u002Fgithub.com\u002Flabsyspharm\u002Fscimap) - Spatial Single-Cell Analysis Toolkit.  \n\n##### Cell Segmentation\n[microscopy-tree](https:\u002F\u002Fbiomag-lab.github.io\u002Fmicroscopy-tree\u002F) - Review of cell segmentation algorithms, [Paper](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fabs\u002Fpii\u002FS0962892421002518).  \nReview of organoid pipelines - [Paper](https:\u002F\u002Farxiv.org\u002Fftp\u002Farxiv\u002Fpapers\u002F2301\u002F2301.02341.pdf).  \n[BioImage.IO](https:\u002F\u002Fbioimage.io\u002F#\u002F) - BioImage Model Zoo.  \n[MEDIAR](https:\u002F\u002Fgithub.com\u002FLee-Gihun\u002FMEDIAR) - Cell segmentation.  \n[cellpose](https:\u002F\u002Fgithub.com\u002Fmouseland\u002Fcellpose) - Cell segmentation. [Paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2020.02.02.931238v1), [Dataset](https:\u002F\u002Fwww.cellpose.org\u002Fdataset).  \n[stardist](https:\u002F\u002Fgithub.com\u002Fstardist\u002Fstardist) - Cell segmentation with Star-convex Shapes.  \n[instanseg](https:\u002F\u002Fgithub.com\u002Finstanseg\u002Finstanseg) - Cell segmentation.  \n[UnMicst](https:\u002F\u002Fgithub.com\u002FHMS-IDAC\u002FUnMicst) - Identifying Cells and Segmenting Tissue.  \n[ilastik](https:\u002F\u002Fgithub.com\u002Filastik\u002Filastik) - Segment, classify, track and count cells. [ImageJ Plugin](https:\u002F\u002Fgithub.com\u002Filastik\u002Filastik4ij).   \n[nnUnet](https:\u002F\u002Fgithub.com\u002FMIC-DKFZ\u002FnnUNet) - 3D biomedical image segmentation.  \n[allencell](https:\u002F\u002Fwww.allencell.org\u002Fsegmenter.html) - Tools for 3D segmentation, classical and deep learning methods.  \n[Cell-ACDC](https:\u002F\u002Fgithub.com\u002FSchmollerLab\u002FCell_ACDC) - Python GUI for cell segmentation and tracking.  \n[ZeroCostDL4Mic](https:\u002F\u002Fgithub.com\u002FHenriquesLab\u002FZeroCostDL4Mic\u002Fwiki) - Deep-Learning in Microscopy.  \n[DL4MicEverywhere](https:\u002F\u002Fgithub.com\u002FHenriquesLab\u002FDL4MicEverywhere) - Bringing the ZeroCostDL4Mic experience using Docker.  \n[EmbedSeg](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FEmbedSeg) - Embedding-based Instance Segmentation.  \n[segment-anything](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything) - Segment Anything (SAM) from Facebook.  \n[micro-sam](https:\u002F\u002Fgithub.com\u002Fcomputational-cell-analytics\u002Fmicro-sam) - Segment Anything for Microscopy.  \n[Segment-Everything-Everywhere-All-At-Once](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSegment-Everything-Everywhere-All-At-Once) - Segment Everything Everywhere All at Once from Microsoft.  \n[deepcell-tf](https:\u002F\u002Fgithub.com\u002Fvanvalenlab\u002Fdeepcell-tf\u002Ftree\u002Fmaster) - Cell segmentation, [DeepCell](https:\u002F\u002Fdeepcell.org\u002F).  \n[labkit](https:\u002F\u002Fgithub.com\u002Fjuglab\u002Flabkit-ui) - Fiji plugin for image segmentation.  \n[MedImageInsight](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.06542) - Embedding Model for General Domain Medical Imaging.  \n[CHIEF](https:\u002F\u002Fgithub.com\u002Fhms-dbmi\u002FCHIEF) - Clinical Histopathology Imaging Evaluation Foundation Model.  \n\n##### Cell Segmentation Datasets\n[cellpose](https:\u002F\u002Fwww.cellpose.org\u002Fdataset) - Cell images.  \n[omnipose](http:\u002F\u002Fwww.cellpose.org\u002Fdataset_omnipose) - Cell images.  \n[LIVECell](https:\u002F\u002Fgithub.com\u002Fsartorius-research\u002FLIVECell) - Cell images.  \n[Sartorius](https:\u002F\u002Fwww.kaggle.com\u002Fcompetitions\u002Fsartorius-cell-instance-segmentation\u002Foverview) - Neurons.  \n[EmbedSeg](https:\u002F\u002Fgithub.com\u002Fjuglab\u002FEmbedSeg\u002Freleases\u002Ftag\u002Fv0.1.0) - 2D + 3D images.  \n[connectomics](https:\u002F\u002Fsites.google.com\u002Fview\u002Fconnectomics\u002F) - Annotation of the EPFL Hippocampus dataset.  \n[ZeroCostDL4Mic](https:\u002F\u002Fwww.ebi.ac.uk\u002Fbiostudies\u002FBioImages\u002Fstudies\u002FS-BIAD895) - Stardist example training and test dataset.  \n\n##### Evaluation\n[seg-eval](https:\u002F\u002Fgithub.com\u002Flstrgar\u002Fseg-eval) - Cell segmentation performance evaluation without Ground Truth labels, [Paper](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2023.02.23.529809v1.full.pdf).  \n\n##### Feature Engineering Images\n[Computer vision challenges in drug discovery - Maciej Hermanowicz](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Y5GJmnIhvFk)  \n[CellProfiler](https:\u002F\u002Fgithub.com\u002FCellProfiler\u002FCellProfiler) - Biological image analysis.   \n[scikit-image](https:\u002F\u002Fgithub.com\u002Fscikit-image\u002Fscikit-image) - Image processing.  \n[scikit-image regionprops](https:\u002F\u002Fscikit-image.org\u002Fdocs\u002Fdev\u002Fapi\u002Fskimage.measure.html#skimage.measure.regionprops) - Regionprops: area, eccentricity, extent.  \n[mahotas](https:\u002F\u002Fgithub.com\u002Fluispedro\u002Fmahotas) - Zernike, Haralick, LBP, and TAS features, [example](https:\u002F\u002Fgithub.com\u002Fluispedro\u002Fpython-image-tutorial\u002Fblob\u002Fmaster\u002FSegmenting%20cell%20images%20(fluorescent%20microscopy).ipynb).   \n[pyradiomics](https:\u002F\u002Fgithub.com\u002FAIM-Harvard\u002Fpyradiomics) - Radiomics features from medical imaging.  \n[pyefd](https:\u002F\u002Fgithub.com\u002Fhbldh\u002Fpyefd) - Elliptical feature descriptor, approximating a contour with a Fourier series.  \n[pyvips](https:\u002F\u002Fgithub.com\u002Flibvips\u002Fpyvips\u002Ftree\u002Fmaster) - Faster image processing operations.  \n\n#### Domain Adaptation \u002F Batch-Effect Correction \n[Tran - A benchmark of batch-effect correction methods for single-cell RNA sequencing data](https:\u002F\u002Fgenomebiology.biomedcentral.com\u002Farticles\u002F10.1186\u002Fs13059-019-1850-9), [Code](https:\u002F\u002Fgithub.com\u002FJinmiaoChenLab\u002FBatch-effect-removal-benchmarking).  \n[R Tutorial on correcting batch effects](https:\u002F\u002Fbroadinstitute.github.io\u002F2019_scWorkshop\u002Fcorrecting-batch-effects.html).  \n[harmonypy](https:\u002F\u002Fgithub.com\u002Fslowkow\u002Fharmonypy) - Fuzzy k-means and locally linear adjustments.  \n[pyliger](https:\u002F\u002Fgithub.com\u002Fwelch-lab\u002Fpyliger) - Batch-effect correction, [R package](https:\u002F\u002Fgithub.com\u002Fwelch-lab\u002Fliger).  \n[nimfa](https:\u002F\u002Fgithub.com\u002Fmims-harvard\u002Fnimfa) - Nonnegative matrix factorization.  \n[scgen](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscgen) - Batch removal. [Doc](https:\u002F\u002Fscgen.readthedocs.io\u002Fen\u002Fstable\u002F).  \n[CORAL](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002F30e54523f08d963ced3fbb37c00e9225579d2e1d\u002Fcorrect_batch_effects_wdn) - Correcting for Batch Effects Using Wasserstein Distance, [Code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Fblob\u002F30e54523f08d963ced3fbb37c00e9225579d2e1d\u002Fcorrect_batch_effects_wdn\u002Ftransform.py#L152), [Paper](https:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC7050548\u002F).   \n[adapt](https:\u002F\u002Fgithub.com\u002Fadapt-python\u002Fadapt) - Awesome Domain Adaptation Python Toolbox.  \n[pytorch-adapt](https:\u002F\u002Fgithub.com\u002FKevinMusgrave\u002Fpytorch-adapt) - Various neural network models for domain adaptation.  \n\n##### Sequencing\n[Single cell tutorial](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fsingle-cell-tutorial).  \n[PyDESeq2](https:\u002F\u002Fgithub.com\u002Fowkin\u002FPyDESeq2) - Analyzing RNA-seq data.  \n[cellxgene](https:\u002F\u002Fgithub.com\u002Fchanzuckerberg\u002Fcellxgene) - Interactive explorer for single-cell transcriptomics data.  \n[scanpy](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fscanpy) - Analyze single-cell gene expression data, [tutorial](https:\u002F\u002Fgithub.com\u002Ftheislab\u002Fsingle-cell-tutorial).  \n[besca](https:\u002F\u002Fgithub.com\u002Fbedapub\u002Fbesca) - Beyond single-cell analysis.  \n[janggu](https:\u002F\u002Fgithub.com\u002FBIMSBbioinfo\u002Fj","该项目是一个精心整理的Python数据科学资源列表，涵盖了从基础库到高级教程等多方面的内容。核心功能包括提供pandas、scikit-learn、matplotlib等主流数据处理与分析工具的链接，以及一些提高开发效率的小工具如tqdm（进度条）、structlog（日志记录）等。此外，还特别推荐了一些针对Pandas优化或替代方案，比如DuckDB用于高效SQL查询执行、Polars作为多线程的Pandas替代品等。适合于正在使用Python进行数据分析、机器学习项目开发的数据科学家和技术爱好者参考和学习。",2,"2026-06-11 03:24:49","top_topic"]