<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Essay | Vincenzo Coia | Statistician, combining research and practice for probabilistic and risk modelling in the earth sciences.</title>
    <link>https://vincenzocoia.com/category/essay/</link>
      <atom:link href="https://vincenzocoia.com/category/essay/index.xml" rel="self" type="application/rss+xml" />
    <description>Essay</description>
    <generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sun, 18 Feb 2018 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://vincenzocoia.com/media/icon_hu6486377679820991123.png</url>
      <title>Essay</title>
      <link>https://vincenzocoia.com/category/essay/</link>
    </image>
    
    <item>
      <title>The missing question in supervised learning</title>
      <link>https://vincenzocoia.com/post/missing_question/</link>
      <pubDate>Sun, 18 Feb 2018 00:00:00 +0000</pubDate>
      <guid>https://vincenzocoia.com/post/missing_question/</guid>
      <description>&lt;p&gt;You all know the drill &amp;ndash; you&amp;rsquo;re asked to make predictions of a continuous variable, so you turn to your favourite supervised learning method to do the trick. But have you ever suspected that you could be after the wrong type of output before you even begin?&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Decision_tree_learning&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Regression trees&lt;/a&gt;, &lt;a href=&#34;https://en.wikipedia.org/wiki/Local_regression&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;loess&lt;/a&gt;, &lt;a href=&#34;https://en.wikipedia.org/wiki/Linear_regression&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;linear regression&lt;/a&gt;&amp;hellip; you name it, they&amp;rsquo;re all in pursuit of the &lt;a href=&#34;https://en.wikipedia.org/wiki/Expected_value&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mean&lt;/a&gt; (well, almost all). But the true outcome is random. It has a distribution. Are you sure you want the mean of that distribution?&lt;/p&gt;
&lt;p&gt;You might say &amp;ldquo;Yes! It ensures my prediction is as close as possible to the outcome!&amp;rdquo; If this is indeed what you want, the mean still might not be your best choice &amp;ndash; it only ensures the &lt;a href=&#34;https://en.wikipedia.org/wiki/Mean_squared_error&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mean squared error&lt;/a&gt; is minimized.&lt;/p&gt;
&lt;p&gt;There are a suite of other options that might be more appropriate than the mean. The good thing is, your favourite supervised learning method probably has a natural extension for estimating these alternatives. Let&amp;rsquo;s investigate the quantities you might care about.&lt;/p&gt;
&lt;h3 id=&#34;the-median&#34;&gt;The Median&lt;/h3&gt;
&lt;p&gt;No, the median &lt;em&gt;isn&amp;rsquo;t&lt;/em&gt; just an inferior version of the mean, to be used under the unfortunate presence of outliers.&lt;/p&gt;
&lt;p&gt;If I randomly pick a data scientist, what do you think their salary would be? This distribution has a right-skew, so chances are, your data scientist earns less than the mean. Predict the &lt;a href=&#34;https://en.wikipedia.org/wiki/Median&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;median&lt;/a&gt;, and you&amp;rsquo;ll have a 50% chance that your data scientist &lt;em&gt;does&lt;/em&gt; earn at least what you predict.&lt;/p&gt;
&lt;p&gt;In short, use the median when you want your prediction to be exceeded with a coin toss.&lt;/p&gt;
&lt;p&gt;Minimize the &lt;a href=&#34;https://en.wikipedia.org/wiki/Mean_absolute_error&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mean absolute error&lt;/a&gt; to get this prediction.&lt;/p&gt;
&lt;h3 id=&#34;higher-or-lower-quantiles&#34;&gt;Higher (or lower) Quantiles&lt;/h3&gt;
&lt;p&gt;Want to make it to an interview on time? You add some &amp;ldquo;buffer time&amp;rdquo; to the expected travel time, right? What you&amp;rsquo;re after is a high &lt;a href=&#34;https://en.wikipedia.org/wiki/Quantile&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;quantile&lt;/a&gt; of travel time &amp;ndash; something like the 0.99-quantile, so that there is only a small chance you&amp;rsquo;ll be late (1% in this case).&lt;/p&gt;
&lt;p&gt;Use a high (or low) quantile if you want a conservative (or liberal) prediction &amp;ndash; or both, if you want a prediction interval.&lt;/p&gt;
&lt;p&gt;Minimize the mean &lt;a href=&#34;https://en.wikipedia.org/wiki/Quantile_regression#Quantiles&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;rho function&lt;/a&gt; to get this prediction.&lt;/p&gt;
&lt;h3 id=&#34;the-mean&#34;&gt;The Mean&lt;/h3&gt;
&lt;p&gt;The mean is useful when we care about &lt;em&gt;totals&lt;/em&gt;. Want to know how much gas a vehicle uses?  You&amp;rsquo;re after the mean, because the total quantity drawn out over time is what matters.&lt;/p&gt;
&lt;p&gt;Minimize the mean squared error to get this prediction.&lt;/p&gt;
&lt;h3 id=&#34;other-options&#34;&gt;Other Options&lt;/h3&gt;
&lt;p&gt;Do you really need to distill your prediction down to a single number? Consider looking at the entire distribution of the outcome as your prediction (typically conditional on predictors) &amp;ndash; after all, this conveys the entire uncertainty about the outcome. This is known as &lt;a href=&#34;https://en.wikipedia.org/wiki/Probabilistic_forecasting&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;probabilistic forecasting&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are other measures, too. &lt;a href=&#34;https://en.wikipedia.org/wiki/Expected_shortfall&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Expected shortfall&lt;/a&gt; is useful for risk analysis, or even &lt;a href=&#34;http://www.statcan.gc.ca/pub/12-001-x/2016001/article/14545/03-eng.htm&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;expectiles&lt;/a&gt;. Maybe you care about variance or skewness for some reason. Whatever you want to get at, just make sure you ask yourself what you actually care about. You have an entire distribution to distill!&lt;/p&gt;
&lt;p&gt;(&lt;a href=&#34;https://www.pexels.com/photo/abstract-blackboard-bulb-chalk-355948/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Photo from Pexels&lt;/a&gt;)&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
