The squared error has friends, too!

I was invited to the SFU/UBC Joint Seminar in Spring 2019 where I gave this talk.

## Warning: package 'ggplot2' was built under R version 3.5.2
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2

Who am I?

Quiz

True or False:

  • The least squares estimator is derived by maximizing the likelihood.

True or False:

  • The least squares estimator is derived by maximizing the likelihood.
  • If errors are not Gaussian, we can’t use least squares to estimate our regression coefficients.

True or False:

  • The least squares estimator is derived by maximizing the likelihood.
  • If errors are not Gaussian, we can’t use least squares to estimate our regression coefficients.

Answers: Both FALSE!

Question:

When we fit a regression model (linear, kNN, random forest, etc.), what is the interpretation of the resulting model function / regression curve?

Question:

When we fit a regression model (linear, kNN, random forest, etc.), what is the interpretation of the resulting model function / regression curve?

Answer: The mean of Y given X.

Concept #1

We minimize the SSE in regression because it’s a proper scoring rule for the mean.


On the board, lets:

  1. Write ybar as a minimization problem.
  2. Extend to regression

Concept #2

When doing regression, we ought to consider quantiles, too!


Consider Y = monthly expenditure (in $). Interpretation of quantities:

  • median:
  • low-quantile:
  • high-quantile:
  • mean:

Consider Y = monthly expenditure (in $). Interpretation of quantities:

  • median: There’s a 50-50 chance that you’ll have to pay more than this.
  • low-quantile:
  • high-quantile:
  • mean:

Consider Y = monthly expenditure (in $). Interpretation of quantities:

  • median: There’s a 50-50 chance that you’ll have to pay more than this.
  • low-quantile: You’ll “at least” have to pay this much.
  • high-quantile:
  • mean:

Consider Y = monthly expenditure (in $). Interpretation of quantities:

  • median: There’s a 50-50 chance that you’ll have to pay more than this.
  • low-quantile: You’ll “at least” have to pay this much.
  • high-quantile: You’ll “at most” have to pay this much.
  • mean:

Consider Y = monthly expenditure (in $). Interpretation of quantities:

  • median: There’s a 50-50 chance that you’ll have to pay more than this.
  • low-quantile: You’ll “at least” have to pay this much.
  • high-quantile: You’ll “at most” have to pay this much.
  • mean: Multiply by m to estimate total $ after m months.

Concept #3

Each quantile has its own proper scoring rule that we can use instead of the squared error.


On the board:

  1. Write median as an optimization problem
  2. Extend to generic quantile
  3. Extend to regression

The “check function”:

Concept #4

Make a distributional assumption to reduce estimation uncertainty.

Univariate Estimation

If you have a univariate sample \(Y_1, \ldots, Y_n\):

Distributional Assumption? Estimation Method
No
Yes

Univariate Estimation

If you have a univariate sample \(Y_1, \ldots, Y_n\):

Distributional Assumption? Estimation Method
No “sample versions”: ybar, s^2, quantile(), …
Yes MLE

Regression setting

If you have a univariate sample \(Y_1, \ldots, Y_n\) AND predictors:

Distributional Assumption? Estimation Method
No
Yes

Regression setting

If you have a univariate sample \(Y_1, \ldots, Y_n\) AND predictors:

Distributional Assumption? Estimation Method
No Optimize scoring rule for desired quantity.
Yes MLE

Return of the Quiz

Return of the Quiz

Can we see why these are false?

  • The least squares estimator is derived by maximizing the likelihood.
  • If errors are not Gaussian, we can’t use least squares to estimate our regression coefficients.

Time left?

Time left?

Give two ways to estimate the conditional variance.

Hint: Think about the definition of variance.

Talk to your neighbour for 1 minute

Avatar
Vincenzo Coia
he/him/his 🌈 👨

I’m a data scientist at the University of British Columbia, Vancouver.

comments powered by Disqus