Bio

My favourite part of statistics for data science is in conveying a prediction’s uncertainty. All too often, predictions are conveyed deterministically, as though an omniscient expert is broadcasting an inevitability: national GDP will increase by 5% next year; a river will peak to 0.5 meters below the town’s levee in two days. The truth, in reality, is far less certain, and that uncertainty can make the difference when making big decisions such as whether to evacuate a town. My goal, therefore, is to make it easier for all data scientists to establish and communicate uncertainty.

Conveying uncertainty requires the use of probability distributions. This means more than making an elusive Normal assumption, or fitting a test statistic to a t-distribution – it means building realistic distributions as an output, perhaps even using machine learning, and converting that output into something that’s understandable by non-experts. It also means interpreting a deterministic prediction probabilistically, such as by a mean or median. To aid in working with distributions, I’m creating packages for the R Project for Statistical Computing that make it easier to work with distributions. You can find links to these packages on this website.

My main focus these days has been on developing data science initiatives at UBC. I’ve focussed on developing and delivering the Master of Data Science (MDS) program at UBC for its first four years, and am now developing a new minor program in data science at UBC with my colleagues. I also promote the development of sane analyses by continuing to adapt STAT 545. I like to make my work public, much of which you can find from this website.

To give everything context, I like to remain aware of the demands placed on data science by staying in touch with organizations and their data science problems. I usually do this through the MDS capstone course, but I also work with various organizations on the side.

Career

Career Philosophy

What is an ideal career to me?

Teaching Philosophy

What does being an effective teacher mean to me?

CV

The things I’ve done.

Teaching

Courses I teach or have taught at UBC.

STAT 545

Data wrangling, exploration, and analysis with R

Regression II

DSCI 531

Data Visualization I

DSCI 551

Descriptive Statistics and Probability for Data Science

DSCI 511

Programming for Data Science

DSCI 591

MDS capstone project

R Packages

R Packages that I am involved with.

distplyr

Draw powerful insights using distributions.

rqdist

Build predictive distributions using linear quantile regression.

Recent Tutorials

Bits and pieces of resources I’ve made for teaching.

⚠️ Needs cleaning ⚠️

The squared error has friends, too!

I was invited to the SFU/UBC Joint Seminar in Spring 2019 where I gave this talk. ## Warning: package 'ggplot2' was built under …

Factor Analysis

## Here's a little tutorial on how to use the factanal function for Factor Analysis. ## Let's make a data frame using actual …

Contour Plots

This tutorial introduces contour plots, and how to plot them in ggplot2. What is a contour plot? Suppose you have a map of a …

Mixture distributions

This tutorial introduces the concept of a mixture distribution. We’ll look at a basic example first, using intuition, and then describe …

Collaboration with Version Control: Git and GitHub

Does this happen to you? What about bouncing files back and forth amongst collaborators through email? Or even keeping track of your …