class: center, middle, inverse, title-slide # Communicating Data ### Vincenzo Coia --- class: middle, center # Pictures are worth 1000 words [Challenger space shuttle example](https://speakerdeck.com/jennybc/ggplot2-tutorial?slide=7) Just plot the data! --- ## Outline for Today ### Part 1 Visualizing different types of data ### Part 2 Principles for making effective plots ### Attribution Ideas are from ["Fundamentals of Data Visualization"](https://clauswilke.com/dataviz/) by Claus Wilke. You should check it out! Slides are my own. --- class: inverse, middle, center # Part 1 # Choosing an effective plot for your data type --- class: middle, center # [data-to-viz](https://www.data-to-viz.com/) --- ## Visualizing Amounts Use bars. <!-- --> --- ## Visualizing Amounts Use bars. Sensibly rearrange. <!-- --> In this case: movie order __and__ descending order. --- ## Visualizing Amounts Bars must go to zero because we interpret their _area_. __Don't do this__: <!-- --> --- ## Visualizing Amounts When emphasizing _differences_ instead of _ratios_, zero doesn't matter. Use points instead of bars: <!-- --> --- ## Visualizing Amounts x-labels too big? Don't be afraid to swap axes. <!-- --> --- ## Visualizing Amounts Sorting, again. <!-- --> --- ## Visualizing Distributions Want to compare body mass of three penguin species? Please don't use pinhead plots. <!-- --> --- ## Visualizing Distributions Plot all the data instead. <!-- --> --- ## Visualizing Distributions Even better, add some jitter and alpha transparency: <!-- --> --- ## Visualizing Distributions: too much data to show Too much data to show? Could use boxplots: <!-- --> --- ## Visualizing Distributions: too much data to show Could also make a histogram for each one. <!-- --> --- ## Visualizing Distributions: too much data to show Whatever you do, don't combine and colour: <!-- --> --- ## Visualizing Distributions: too much data to show Better is to use density plots: <!-- --> --- ## Visualizing Distributions: too much data to show Even better is to use ridge plots: <!-- --> Who care's about the density values, anyway? --- ## Visualizing Distributions: too much data to show Can pack in many categories (ordered, of course). <!-- --> --- ## Visualizing Distributions: too much data to show You might even be able to get away with colouring by continent. (arguable) <!-- --> --- ## Visualizing Trends Stacking just doesn't work. <!-- --> --- ## Visualizing Trends Putting bars beside each other is better, but still not good. <!-- --> --- ## Visualizing Trends Best is to forego the bars and use lines: <!-- --> --- ## Visualizing Trends Bonus: line up the legend. <!-- --> --- # Activity: Worsen the plot <!-- --> Code and idea by Firas Moosvi --- class: inverse, middle, center # Part 2 # Principles of Figure Design --- class: middle, center # Data-to-ink ratio [Less is More](https://speakerdeck.com/cherdarchuk/remove-to-improve-the-data-ink-ratio) --- ## Overlapping Points How do you know there aren't overlapping points here? You don't. <!-- --> --- ## Overlapping Points Add some transparency, and suddenly you can tell. <!-- --> --- ## Overlapping Points Or, jitter the points a little bit. <!-- --> --- ## Overlapping Points When jittering isn't an option, and alpha transparency isn't enough? <!-- --> --- ## Overlapping Points Consider reducing the size of the points: <!-- --> --- ## Overlapping Points Or, use hexagonal binning (heatmap): <!-- --> --- ## Colour Don't try to choose your own colours. <!-- --> --- ## Colour Leave it to an expert: https://colorbrewer2.org/ <!-- --> --- ## Colour Avoid too many colours <!-- --> --- ## Colour Are you sure you just don't want to highlight a few or even one category of interest? <!-- --> --- ## Colour Blindness Previous colour palette with Protanope (reduction of reds): <img src="regular.png" width="300" /> <img src="protanope.png" width="300" /> <!-- { width=50% } --> <!-- { width=50% } --> (Converted by [hclwizard](http://hclwizard.org/cvdemulator/)) --- ## Colour Blindness You could try to accommodate colour blindness **and** still use colour... <!-- --> Viridis scale --- ## Colour Blindness Better yet, don't rely on colour at all. Facet by species: <!-- --> Notice the axes are comparable. --- ## Choose an Appropriate Scale Here, the data are pressed against the y-axis. Tons of whitespace. <!-- --> --- ## Choose an Appropriate Scale Use a log scale on the x-axis. <!-- --> --- ## Activity: Improve the Plot How can we make this plot better? <!-- --> --- class: inverse, middle, center # Activity: Reflection Reflect in groups: 1. What one thing will you do differently from today's lecture? 2. What's your favourite plot from this lecture? --- ## Final Remarks: Making plots - All plots today are __reproducible__. - Built with `ggplot2` in R. - Take a look at Episode 5 of [STAT 545](https://www.youtube.com/channel/UCrB-uourf2vxGeBnGjQrA0w) for an intro to `ggplot2` - Non-reproducible (like Excel) not recommended, but will do if that's all you know. <img src="ggplot2.png" width="50%" />