[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71715":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":15,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},71715,"FriendsDontLetFriends","cxli233\u002FFriendsDontLetFriends","cxli233","Friends don't let friends make certain types of data visualization - What are they and why are they bad. ","",null,"R",7072,286,130,3,0,8,17,9,75.07,"MIT License",false,"main",true,[26,27],"data-visualization","r","2026-06-12 04:01:01","# Friends Don't Let Friends Make Bad Graphs \n\n\n[![DOI](https:\u002F\u002Fzenodo.org\u002Fbadge\u002FDOI\u002F10.5281\u002Fzenodo.7542491.svg)](https:\u002F\u002Fdoi.org\u002F10.5281\u002Fzenodo.7542491)\n\n\nFriends don't let friends make certain types of data visualization - What are they and why are they bad. \n\n* Author: Chenxin Li, Ph.D., Assistant Professor at Department of Plant Biology, Michigan State University. \n* Contact: lichen27@msu.edu | [@chenxinli2.bsky.social](https:\u002F\u002Fbsky.app\u002Fprofile\u002Fchenxinli2.bsky.social)\n\nThis is an *opinionated* essay about good and bad practices in data visualization. \nExamples and explanations are below. \n\nThe `Scripts\u002F` directory contains `.Rmd` files that generate the graphics shown below. \nIt requires R, RStudio, and the rmarkdown package. \n\n* R: [R Download](https:\u002F\u002Fcran.r-project.org\u002Fbin\u002F)\n* RStudio: [RStudio Download](https:\u002F\u002Fwww.rstudio.com\u002Fproducts\u002Frstudio\u002Fdownload\u002F)\n* rmarkdown can be installed using the install packages interface in RStudio\n\n# Table of contents\n\n1. [Friends Don't Let Friends Make Bar Plots For Mean Separation](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#1-friends-dont-let-friends-make-bar-plots-for-means-separation)\n2. [Friends Don't Let Friends Make Violin Plots for Small Sample Sizes](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#2-friends-dont-let-friends-make-violin-plots-for-small-sample-sizes)\n3. [Friends Don't Let Friends Use Bidirectional Color Scales for Unidirectional Data](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#3-friends-dont-let-friends-use-bidirectional-color-scales-for-unidirectional-data)\n4. [Friends Don't Let Friends Make Bar Plot Meadow](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#4-friends-dont-let-friends-make-bar-plot-meadow)\n5. [Friends Don't Let Friends Make Heatmap without Reordering Rows & Columns](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#5-friends-dont-let-friends-make-heatmap-without-considering-reordering-rows--columns)\n6. [Friends Don't Let Friends Make Heatmap without Checking Outliers](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#6-friends-dont-let-friends-make-heatmap-without-checking-outliers)\n7. [Friends Don't Let Friends Forget to Check Data Range at Each Factor Level](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#7-friends-dont-let-friends-forget-to-check-data-range-at-each-factor-level)\n8. [Friends Don't Let Friends Make Network Graphs without Trying Different Layouts](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#8-friends-dont-let-friends-make-network-graphs-without-trying-different-layouts) \n9. [Friends Don't Let Friends Confuse Position and Length Based Visualizations](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#9-friends-dont-let-friends-confuse-position-based-visualizations-with-length-based-visualizations) \n10. [Friends Don't Let Friends Make Pie Charts](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#10-friends-dont-let-friends-make-pie-chart) \n11. [Friends Don't Let Friends Make Concentric Donuts](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#11-friends-dont-let-friends-make-concentric-donuts)\n12. [Friends Don't Let Friends Use Red\u002Fgreen and Rainbow for Color Scales](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#12-friends-dont-let-friends-use-redgreen-and-rainbow-color-scales)\n13. [Friends Don't Let Friends Forget to Reorder Stacked Bar Plot](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Ftree\u002Fmain#13-friends-dont-let-friends-forget-to-reorder-stacked-bar-plot)\n14. [Friends Don't Let Friends Mix Stacked Bars and Mean separation](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Ftree\u002Fmain#14-friends-dont-let-friends-mix-stacked-bars-and-mean-separation)\n15. [Friends Don't Let Friends Use Histogram for Small Sample Sizes](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Ftree\u002Fmain?tab=readme-ov-file#friends-dont-let-friends-use-histogram-for-small-sample-sizes)\n16. [Friends don't Let Friends Use Boxpot for Bimodal Data](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends?tab=readme-ov-file#friends-dont-let-friends-use-boxpot-for-bimodal-data)\n\n# 1. Friends Don't Let Friends Make Bar Plots for Means Separation\n\nThis has to be the first one. \nMeans separation plots are some of the most common in scientific publications. \nWe have two or more groups, which contains multiple observations; they may have different means, variances, and distributions. \nThe task of the visualization is to show the means and the spread (dispersion) of the data. \n\n![No Bar Plots for Means Separation](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002Fdont_bar_plot.png) \n\nIn this example, two groups have similar means and standard deviations, but quite different distributions. **Are they really \"the same\"?**\nJust don't use bar plot for means separation, or at least check a couple things before settling down on a bar plot. \n\nIt's worth mentioning that I was inspired by many researchers who have tweeted on the limitation of bar graphs. \nHere is a publication: [Weissgerber et al., 2015, PLOS Biology](https:\u002F\u002Fjournals.plos.org\u002Fplosbiology\u002Farticle?id=10.1371\u002Fjournal.pbio.1002128). \n\n# 2. Friends Don't Let Friends Make Violin Plots for Small Sample Sizes \n\nThis is quite common in the literature as well, but unfortunately, violin plots (or any sort of smoothed distribution curves) make no sense for small n. \n\n![Beware of Violin Plots for Small Sample Sizes](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FBeware_of_small_n_box_violin_plot.png) \n\nDistributions and quartiles can vary widely with small n, even if the underlying observations are similar. \nDistribution and quartiles are only meaningful with large n. \nI did an experiment before, where I sampled the *same* normal distribution several times and computed the quartiles for each sample.\nThe quartiles only stablize when n gets larger than 50. \n\n# 3. Friends Don't Let Friends Use Bidirectional Color Scales for Unidirectional Data \n\nExcuse my language, but this is a truly data visualization sin, and again quite common. \nI can understand why this error is common, because it appears that many of us have not spent a lot of thoughts on this issue. \n\n![Are You Using the Right Color Scale for Your Data?](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FColorScales.svg)\n\nColor scales are pretty, but we have to be extra careful.\nWhen color scales (or color gradients) are used to represent numerical data, the darkest and lightest colors should have special meanings.\nYou can decide what those special meanings are: e.g., max, min, mean, zero. But they should represent something meaningful. \nA data visualization sin for heat maps\u002Fcolor gradients is when the lightest or darkest colors are some arbitrary numbers. \n*This is as bad as the longest bar in a bar chart not being the largest value.* Can you imagine that?  \n\n# 4. Friends Don't Let Friends Make Bar Plot Meadow \n\nWe talked about no bar charts for mean separation, but this is a different issue. \nIt has to do with presenting results of a multi-factorial experiment. \nBar plot meadows are very common in scientific publications and unfortunately also *ineffective* in communicating the results. \n\n![Horrendous Giant Bar Plot vs. Better Designed Plot](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FAvoidBarPlotMeadow.png)\n\nData from: [Matand et al., 2020, BMC Plant Biology](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1186\u002Fs12870-020-2243-7)\n\nBar plot meadows are common because multi-factorial experiments are common. \nHowever, a bar plot meadow is poorly designed for its purpose. \nTo communicate results of a multi-factorial experiment, it requires thoughtful designs regarding grouping\u002Ffaceting by factors of interest.\n\nIn this example, I focus on comparing the effect of `Treatment` & `Explant` on `Response` at the level of each `Variety`. \nHowever, if the focus is the effect of `Treatment` & `Variety` on `Response` at the level of each `Exaplant`, then it will require a different layout. \n\n# 5. Friends Don't Let Friends Make Heatmap without (Considering) Reordering Rows & Columns \n\nHeatmaps are very common in scientific publications, and *very very* common in omics papers. \nHowever, for heatmaps to be effective, we have to consider the ordering of rows & columns. \n\n![A Heatmap before and after reordering rows and columns](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FReorder_rows_and_columns_for_heatmap.png) \n\nIn this example, I have cells as columns and features as rows. Grids are showing z scores. \nIt is impossible to get anything useful out of the heatmap without reordering rows and columns. \nWe can reorder rows and columns using clustering, but that is not the only way. \nOf course, if the rows and columns are mapping to physical entities (rows and columns of a 96-well plate), then you can't reorder them. \nBut it is a very good idea to at least consider reordering rows and columns. \n\nData from: [Li et al., 2022, BioRxiv](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.07.04.498697v1) \n\n## Bonus: heatmaps can be very pretty\n\n...if you are good are reordering rows\u002Fcolumns and choosing color gradients. \nHere is an example \"abstract aRt\" generated from simulated data. \n\n![aRt with Heatmap](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FAbstract_R_2022_11_24.svg)        \n\nR code for this aRt piece can be found [here](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FScripts\u002FAbstract_aRt.R). \n\nFor a tutorial on how to reorder rows and columns of a heatmap, see this [markdown file](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FHeatmap_tutorial.md). \n\n# 6. Friends Don't Let Friends Make Heatmap without Checking Outliers \n\nOutliers in heatmap can really change how we perceive and interpret the visualization. \nThis generalizes to all sort of visualizations that use colors to represent numeric data.\nLet me show you an example:\n\n![Did you check outliers](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FCheck_outliers_for_heatmap.svg)\n\nIn this example, I have 2 observations. For each observations, I measured 20 features. \nWithout checking for outliers, it may appear that the 2 observations are overall similar, except at 2 features. \nHowever, after maxing out the color scale around 95th percentile of the data, it reveals that the two observations are distinct across all features. \n\n# 7. Friends Don't Let Friends Forget to Check Data Range at Each Factor Level \n\nThis is a common issue that many of us have encountered. \nIn a multifactor experiment, sometimes the range of the response variable changes widely between different factor levels. \n\n![Did you check data range at each factor level](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FCheck_range_at_factor_level.svg)\n\nThis hypothetical experiment measured 3 compounds across 2 groups (control vs. treatment). \nWithout checking data range for each compound, you will likely have missed that the treatment had a strong effect on compound 1.\nThis is because the concentration of compound 1 has a much narrower range than the other compounds in this experiment. \n\n# 8. Friends Don't Let Friends Make Network Graphs without Trying Different Layouts\n\nNetwork graphs are common in scientific publications. They are super useful in presenting relationship data. \nHowever, the appearance (not the topology) of the network can make a huge difference in determining if a network graph is effective. \n\n![Try different network layouts](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FTryDifferentLayouts.svg) \n\nLayouts can drastically change the appearance of networks, making them easier or harder to interpret.\nHere are 3 network graphs from the same data. They look very different from each other.\nData from: [Li et al., 2022, BioRxiv](https:\u002F\u002Fwww.biorxiv.org\u002Fcontent\u002F10.1101\u002F2022.07.04.498697v1) \n\nHere is 9 different layouts for the _same_ network. They can look very different. \n\n![Different layouts](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002Fnetwork_layouts.gif)\n\nThe R script to make this animation is available [here](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FScripts\u002FAnimated_networks.Rmd)\n\n![Different layout of the animation](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FDifferent_layouts.png)\n\n# 9. Friends Don't Let Friends Confuse Position-based Visualizations with Length-based Visualizations \n\nThis is always the elephant in the room and the essence of many misleading visualizations. \nIn this example, I measured a response variable across 3 time points. \nTwo of the following graphs are fine, but one of them is a data visualization crime. Can you see why? \n\n![Position vs. length based visualizations](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FPosition_and_length_based_visualizations.svg)\n\nIn dot and line plots, values are represented by positions along the x and y axis.\nThe same idea applies to other position based visualizations, such as box plots. \nIn bar plots, values are represented by the distance from the x axis, and thus the length of the bar. \n\nThe 3rd graph is not 0-based, which makes the bar length at time point 2 about 3x longer than that at time point 1.\nIn fact, the true difference in means is closer to 1.6x. \nI hope you can see how confusing length and position based visualizations can lead to misleading graphs.   \n\n## Watch out for bar plots with broken axis \n\nBroken axis may be useful for depicting data across a wide range of numeric values. \n(Alternatively, log scaled axis can be used instead.) \nBroken axis are fine for position based graphics, because the data are represented by positions along the axis. \nHowever, we must be very careful with bar plots that have broken axis. Here is an example. \n\n![Broken axis](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FBroken_axis.svg) \n\nIn this example, two graphs (left vs. right) are showing the same data. \nHowever, by changing where the axis is broken, one can make certain bars looks longer or shorter. \nIn this example, the length of bar \"d\" can look *really* different.\nThe illusion of bar \"d\" being very short on the right graph boils down to bar plot being a length based graphics, not a position based graphics. \n\nExample R code for broken axis can be found [here](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FScripts\u002FBroken_axis.R). \n\n# 10. Friends Don't Let Friends Make Pie Chart \n\nPie chart is a common type of visualization for fractional data, where fractions add up to 100%. \nThis is achieved by dividing a circle into sectors, and the sectors add up to a full circle. \nPie charts have been criticized, because human are much worse in reading angles and area than reading lengths. \nHere is a [blog post](https:\u002F\u002Fwww.data-to-viz.com\u002Fcaveat\u002Fpie.html) that explores that. \n\n![Don't make pie charts](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002Fdont_pie_chart.svg)\n\nIn this example, we have two groups, each contains 4 sub-categories. \nIn classic pie charts, the angles (and thus arc lengths & sector area) represent the data. \nThe problem is that it is *very* difficult to compare between groups. \nWe can visually simplify the pie chart into donut charts, where the data are now represented by arc lengths. \nHowever, if we want to use lengths to represent the data, why don't we just unwrap the donut and make stacked bars?\nIn stacked bar graphs, bars are shown side-by-side and thus easier to compare across groups. \n\nFun fact: the scripts underlying stacked bars are much simpler than those underlying the pie charts and donut charts.\nIf you want to produce sub-optimal graph types with ggplot, you actually have to work extra hard.\n\n# 11. Friends Don't Let Friends Make Concentric Donuts\n\nIn this example, we have 3 groups, each of which contains two sub-categories (Type I or Type II). \n\n![Don't make concentric donuts](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002Fdont_concentric_donuts.svg)\n\nIn concentric donuts, you might be tempted to say the data are represented by the arc lengths, which is in fact **inaccurate**. \nThe arc lengths on the outer rings are much longer than those in the inner rings. \nGroup 2 and Group 3 have the same exact values, but the arc lengths of Group 3 are much longer. \nIn fact the data are represented by the *arc angles*, which we are bad at reading. \n\nSince outer rings are longer, the ordering of the groups (which group goes to which ring) has a big impact on the impression of the plot.\nIt can lead to the apparent paradox where larger values have shorter arcs. \nThe better (and simpler!) alternative is just unwrap the donuts and make a good old stacked bar plot. \nBTW, this is also my main issue with [circos plots](http:\u002F\u002Fcircos.ca\u002F) and other circular plot layouts.\n\n# 12. Friends Don't Let Friends Use Red\u002FGreen and Rainbow color scales\n\n![are you making a \"safe\" heatmap?](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FColor_blind_grey_scale_safe_heatmap.svg)\n\nDeuteranomaly is the most common type of red\u002Fgreen colorblindness, occurring in 1\u002F16 male and 1\u002F256 female. \nAny color scales that use shades of red and shades of green in the same time would be a problem for a person with red\u002Fgreen colorblindness (third column of the figure). \nIn addition, red\u002Fgreen and rainbow do not preserve information well at all when printed on black\u002Fwhite (grey scale, second column in figure). \nMany scientific software still use red\u002Fgreen or rainbow as the default color scales, which drives me crazy. \nMore \"modern\" color scales, such as [viridis](https:\u002F\u002Fcran.r-project.org\u002Fweb\u002Fpackages\u002Fviridis\u002Fvignettes\u002Fintro-to-viridis.html) are both colorblind-friendly and grey scale-safe (third row of figure). \nAnd they look nice too. \n\n# 13. Friends Don't Let Friends Forget to Reorder Stacked Bar Plot\nStacked bar plots are useful for visualizing proportion data. \nStacked bar plots are commonly used to visualize community structure or population structure or admixture analysis. \nThis kind of visualization boils down to a collection of samples, where each sample contains multiple classes of members. \nHowever, when we have many samples and many classes, stacked bar plots need to be optimized to be effective. \nAnd by \"optimize\" I mean the grouping and ordering of samples. \n\n![Reorder your stacked bars](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FReorder_stacked_bars.png)\n\nHere we have an example data with 100 samples and 8 classes of member. \nDue to the number of samples and classes, it is very hard to discern anything from this graph without optimizing the order of bars. What the heck am I looking at? \nAfter reordering the bars, __wow__, that really made a difference, don't you think? \nFor a tutorial on how to optimize a stack bar plot, see [this script](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FScripts\u002Fstacked_bars_optimization.Rmd).\n\n# 14. Friends Don't Let Friends Mix Stacked Bars and Mean separation\nSometimes a visualization gets confusing and ineffective when it tries to too many things at once. \nOne such example is mixing stacked bar plots and mean separation plots. \nOne displays proportional data adding up to 100%, the other displays the difference in means and dispersion around means. \nThese are very distinct tasks in data visualization. \n\nIn this hypothetical experiment, we had blueberry plants assigned to two groups.\nOne group was the control; the other was treated with a chemical to make fruit development faster.  \nEach group had 5 plants.\nThe response of the treatment was divided into 3 categories: \nlight green fruits, light blue fruits, and dark blue fruits. \n100 fruits from each plant were examined and the number of fruits in each category was counted. \nThe percentage of fruits in each category was calculated and reported. \nThe question of the study is: did the chemical treatment work? \n\n![Don't mix stacked bar plots with mean separation plots](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002Fstacked_bar_vs_jitter.png) \n\nThe first stacked bar plot is fine as the standard way to visualize proportion data. \nIt is clear that all categories add up to 100%, \nand the chemical treatment strongly shifted the color profile towards the most developed stage (dark blue). \n\nThe middle stacked bar plot is problematic, \nmainly because it is trying to do two distinct data visualization tasks at once. \nWhen error bars and dots are overlaid onto the stacked bars, \nit become unclear which error bars and dots are being compared. \nDue to the nature of stacked bars, the error bars and dots of the upper stacks have to be shifted upwards,\nand thus interpretation of the y-axis for error bars and dots become not straightforward. \n\nFinally, if the main point of the visualization is mean separation and dispersion around the mean, \nthe third graph is the better choice. \nThere is no ambiguity on which comparisons are being made.\nAs shown in the first stacked bar plot, \nthe chemical treatment strongly increases the proportion of dark blue fruits, \nat the expense of lighter color fruits. \n\n# Friends don't let friends use histogram for small sample sizes \nI've seen histogram being proposed as the replacement for bar plots. \nHowever, a serious caveat for histogram is that histograms are not robust to bin numbers for small (and even moderate) sample sizes. \nIn a histogram, we first bin the data into a defined number of bins.\nThen we count how many observations are there for each bin and graph them. \n\n![Histogram with different sample sizes and bin numbers](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FHistogram_for_small_n.png)\n\nIn this example, I sampled _the same_ normal distribution 3 times with different sample sizes (n = 10, 100, and 1000).\nEven though they came from _the same_ normal distribution, the histograms look quite different based on the number of bins. \nTo showcase this, I plotted histograms for 10, 30, and 50 bins. \n\nFirst of all, histogram makes no sense for small sample sizes. With small sample sizes (n \u003C 30), the much better practice is to graph all data points. \nSecond of all, you can see that the shape of the histogram is only robust to changing bin number when the sample size is fairly large (like 1000).\nEven if n = 100, the appearance of the histogram can change drastically as the number of bins changes. \n\n# Friends don't let friends use boxpot for bimodal data\nThis figure should speak for itself. Is your boxplot hiding something from you?  \n\n![Is your box plot hiding something from you](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Fblob\u002Fmain\u002FResults\u002FBoxPlots_for_binomial.png)\n\nBefore making a boxplot, one should check the distribution of their data, since box plots focus on median and quartiles, they cannot handle bimodal data (and by extension data with multiple modes).\nPloting all the data points using `geom_quasirandom()` from the [ggbeeswarm package](https:\u002F\u002Fgithub.com\u002Feclarke\u002Fggbeeswarm) is the best practice for small sample to moderate (less than tens of thousands) sample sizes, as distribution-based graphics such as violin plots and histograms are not robust to small sample sizes. See [this section](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends#2-friends-dont-let-friends-make-violin-plots-for-small-sample-sizes) and [this section](https:\u002F\u002Fgithub.com\u002Fcxli233\u002FFriendsDontLetFriends\u002Ftree\u002Fmain?tab=readme-ov-file#friends-dont-let-friends-use-histogram-for-small-sample-sizes) for details. \n\n# Conclusion (?)\n\nThat's it for now. I will update this when I have the time (and inspirations) to produce more examples. \nNot sure what the next one will be, but stay tuned! \n","该项目旨在探讨数据可视化中的好与坏实践，特别指出了某些类型的数据图表为何不推荐使用。它通过具体的例子和解释，帮助用户理解如何避免常见的数据可视化错误。项目的核心功能包括对多种常见但易误用的图表（如条形图、小提琴图、热图等）进行批判性分析，并提供R语言脚本以生成示例图表，从而直观展示问题所在。这些脚本需要R、RStudio以及rmarkdown包的支持来运行。此项目适合任何希望提升自己在数据可视化方面技能的研究人员、数据分析师或学生使用，尤其是那些正在学习如何正确选择和制作图表的人士。",2,"2026-06-11 03:38:45","high_star"]