# The squared error has friends, too!

I was invited to the SFU/UBC Joint Seminar in Spring 2019 where I gave this talk.

## Warning: package 'ggplot2' was built under R version 3.5.2
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2

# Quiz

## True or False:

• The least squares estimator is derived by maximizing the likelihood.

## True or False:

• The least squares estimator is derived by maximizing the likelihood.
• If errors are not Gaussian, we can’t use least squares to estimate our regression coefficients.

## True or False:

• The least squares estimator is derived by maximizing the likelihood.
• If errors are not Gaussian, we can’t use least squares to estimate our regression coefficients.

## Question:

When we fit a regression model (linear, kNN, random forest, etc.), what is the interpretation of the resulting model function / regression curve?

## Question:

When we fit a regression model (linear, kNN, random forest, etc.), what is the interpretation of the resulting model function / regression curve?

Answer: The mean of Y given X.

# Concept #1

We minimize the SSE in regression because it’s a proper scoring rule for the mean.

On the board, lets:

1. Write ybar as a minimization problem.
2. Extend to regression

# Concept #2

When doing regression, we ought to consider quantiles, too!

Consider Y = monthly expenditure (in $). Interpretation of quantities: • median: • low-quantile: • high-quantile: • mean: Consider Y = monthly expenditure (in$). Interpretation of quantities:

• median: There’s a 50-50 chance that you’ll have to pay more than this.
• low-quantile:
• high-quantile:
• mean:

Consider Y = monthly expenditure (in $). Interpretation of quantities: • median: There’s a 50-50 chance that you’ll have to pay more than this. • low-quantile: You’ll “at least” have to pay this much. • high-quantile: • mean: Consider Y = monthly expenditure (in$). Interpretation of quantities:

• median: There’s a 50-50 chance that you’ll have to pay more than this.
• low-quantile: You’ll “at least” have to pay this much.
• high-quantile: You’ll “at most” have to pay this much.
• mean:

Consider Y = monthly expenditure (in $). Interpretation of quantities: • median: There’s a 50-50 chance that you’ll have to pay more than this. • low-quantile: You’ll “at least” have to pay this much. • high-quantile: You’ll “at most” have to pay this much. • mean: Multiply by m to estimate total$ after m months.

# Concept #3

Each quantile has its own proper scoring rule that we can use instead of the squared error.

On the board:

1. Write median as an optimization problem
2. Extend to generic quantile
3. Extend to regression

The “check function”:

# Concept #4

Make a distributional assumption to reduce estimation uncertainty.

## Univariate Estimation

If you have a univariate sample $$Y_1, \ldots, Y_n$$:

Distributional Assumption? Estimation Method
No
Yes

## Univariate Estimation

If you have a univariate sample $$Y_1, \ldots, Y_n$$:

Distributional Assumption? Estimation Method
No “sample versions”: ybar, s^2, quantile(), …
Yes MLE

## Regression setting

If you have a univariate sample $$Y_1, \ldots, Y_n$$ AND predictors:

Distributional Assumption? Estimation Method
No
Yes

## Regression setting

If you have a univariate sample $$Y_1, \ldots, Y_n$$ AND predictors:

Distributional Assumption? Estimation Method
No Optimize scoring rule for desired quantity.
Yes MLE

# Return of the Quiz

## Return of the Quiz

Can we see why these are false?

• The least squares estimator is derived by maximizing the likelihood.
• If errors are not Gaussian, we can’t use least squares to estimate our regression coefficients.

# Time left?

## Time left?

Give two ways to estimate the conditional variance.

Hint: Think about the definition of variance.

Talk to your neighbour for 1 minute

# Resources

This talk was inspired by the activity generated by my blog post “The missing question in supervised learning”.

##### Vincenzo Coia
###### he/him/his 🌈 👨

I’m a data scientist at the University of British Columbia, Vancouver.