class: center, middle, inverse, title-slide # Communicating Data ### Vincenzo Coia --- class: middle, center # Pictures are worth 1000 words [Challenger space shuttle example](https://speakerdeck.com/jennybc/ggplot2-tutorial?slide=7) Just plot the data! --- ## Outline for Today ### Part 1 Visualizing different types of data ### Part 2 Principles for making effective plots ### Attribution Ideas are from ["Fundamentals of Data Visualization"](https://clauswilke.com/dataviz/) by Claus Wilke. You should check it out! Slides are my own. --- class: inverse, middle, center # Part 1 # Choosing an effective plot for your data type --- class: middle, center # [data-to-viz](https://www.data-to-viz.com/) --- ## Visualizing Amounts Use bars. ![](index_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- ## Visualizing Amounts Use bars. Sensibly rearrange. ![](index_files/figure-html/unnamed-chunk-3-1.png)<!-- --> In this case: movie order __and__ descending order. --- ## Visualizing Amounts Bars must go to zero because we interpret their _area_. __Don't do this__: ![](index_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- ## Visualizing Amounts When emphasizing _differences_ instead of _ratios_, zero doesn't matter. Use points instead of bars: ![](index_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ## Visualizing Amounts x-labels too big? Don't be afraid to swap axes. ![](index_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ## Visualizing Amounts Sorting, again. ![](index_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- ## Visualizing Distributions Want to compare body mass of three penguin species? Please don't use pinhead plots. ![](index_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- ## Visualizing Distributions Plot all the data instead. ![](index_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- ## Visualizing Distributions Even better, add some jitter and alpha transparency: ![](index_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ## Visualizing Distributions: too much data to show Too much data to show? Could use boxplots: ![](index_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- ## Visualizing Distributions: too much data to show Could also make a histogram for each one. ![](index_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- ## Visualizing Distributions: too much data to show Whatever you do, don't combine and colour: ![](index_files/figure-html/unnamed-chunk-13-1.png)<!-- --> --- ## Visualizing Distributions: too much data to show Better is to use density plots: ![](index_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- ## Visualizing Distributions: too much data to show Even better is to use ridge plots: ![](index_files/figure-html/unnamed-chunk-15-1.png)<!-- --> Who care's about the density values, anyway? --- ## Visualizing Distributions: too much data to show Can pack in many categories (ordered, of course). ![](index_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- ## Visualizing Distributions: too much data to show You might even be able to get away with colouring by continent. (arguable) ![](index_files/figure-html/unnamed-chunk-17-1.png)<!-- --> --- ## Visualizing Trends Stacking just doesn't work. ![](index_files/figure-html/unnamed-chunk-18-1.png)<!-- --> --- ## Visualizing Trends Putting bars beside each other is better, but still not good. ![](index_files/figure-html/unnamed-chunk-19-1.png)<!-- --> --- ## Visualizing Trends Best is to forego the bars and use lines: ![](index_files/figure-html/unnamed-chunk-20-1.png)<!-- --> --- ## Visualizing Trends Bonus: line up the legend. ![](index_files/figure-html/unnamed-chunk-21-1.png)<!-- --> --- # Activity: Worsen the plot ![](index_files/figure-html/unnamed-chunk-22-1.png)<!-- --> Code and idea by Firas Moosvi --- class: inverse, middle, center # Part 2 # Principles of Figure Design --- class: middle, center # Data-to-ink ratio [Less is More](https://speakerdeck.com/cherdarchuk/remove-to-improve-the-data-ink-ratio) --- ## Overlapping Points How do you know there aren't overlapping points here? You don't. ![](index_files/figure-html/unnamed-chunk-23-1.png)<!-- --> --- ## Overlapping Points Add some transparency, and suddenly you can tell. ![](index_files/figure-html/unnamed-chunk-24-1.png)<!-- --> --- ## Overlapping Points Or, jitter the points a little bit. ![](index_files/figure-html/unnamed-chunk-25-1.png)<!-- --> --- ## Overlapping Points When jittering isn't an option, and alpha transparency isn't enough? ![](index_files/figure-html/unnamed-chunk-26-1.png)<!-- --> --- ## Overlapping Points Consider reducing the size of the points: ![](index_files/figure-html/unnamed-chunk-27-1.png)<!-- --> --- ## Overlapping Points Or, use hexagonal binning (heatmap): ![](index_files/figure-html/unnamed-chunk-28-1.png)<!-- --> --- ## Colour Don't try to choose your own colours. ![](index_files/figure-html/unnamed-chunk-29-1.png)<!-- --> --- ## Colour Leave it to an expert: https://colorbrewer2.org/ ![](index_files/figure-html/unnamed-chunk-30-1.png)<!-- --> --- ## Colour Avoid too many colours ![](index_files/figure-html/unnamed-chunk-31-1.png)<!-- --> --- ## Colour Are you sure you just don't want to highlight a few or even one category of interest? ![](index_files/figure-html/unnamed-chunk-32-1.png)<!-- --> --- ## Colour Blindness Previous colour palette with Protanope (reduction of reds): <img src="regular.png" width="300" /> <img src="protanope.png" width="300" /> <!-- ![](original.png){ width=50% } --> <!-- ![](protanope.png){ width=50% } --> (Converted by [hclwizard](http://hclwizard.org/cvdemulator/)) --- ## Colour Blindness You could try to accommodate colour blindness **and** still use colour... ![](index_files/figure-html/unnamed-chunk-33-1.png)<!-- --> Viridis scale --- ## Colour Blindness Better yet, don't rely on colour at all. Facet by species: ![](index_files/figure-html/unnamed-chunk-34-1.png)<!-- --> Notice the axes are comparable. --- ## Choose an Appropriate Scale Here, the data are pressed against the y-axis. Tons of whitespace. ![](index_files/figure-html/unnamed-chunk-35-1.png)<!-- --> --- ## Choose an Appropriate Scale Use a log scale on the x-axis. ![](index_files/figure-html/unnamed-chunk-36-1.png)<!-- --> --- ## Activity: Improve the Plot How can we make this plot better? ![](index_files/figure-html/unnamed-chunk-37-1.png)<!-- --> --- class: inverse, middle, center # Activity: Reflection Reflect in groups: 1. What one thing will you do differently from today's lecture? 2. What's your favourite plot from this lecture? --- ## Final Remarks: Making plots - All plots today are __reproducible__. - Built with `ggplot2` in R. - Take a look at Episode 5 of [STAT 545](https://www.youtube.com/channel/UCrB-uourf2vxGeBnGjQrA0w) for an intro to `ggplot2` - Non-reproducible (like Excel) not recommended, but will do if that's all you know. <img src="ggplot2.png" width="50%" />