Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably


Headnotes

Many equations and formulas look intimidating. However, when you hunt them down, they are definitely not! Just papertigers!

Now let's hunt the papertiger.

We will use this simple data​1​ set \{x, y\} in all our tutorials. If we use 4th column as label, the 3rd column will be feature, vice versa.

(1)   \begin{equation*} % Table generated by Excel2LaTeX from sheet 'Sheet1' \begin{tabular}{llll} \multicolumn{1}{p{3.6em}}{\textcolor[rgb]{ .2,  .2,  .2}{Sleep quality}} & \multicolumn{1}{p{3.335em}}{\textcolor[rgb]{ .2,  .2,  .2}{Time in bed}} & \multicolumn{1}{p{2.635em}}{\textcolor[rgb]{ .898,  .231,  .318}{\textbf{Wake up}}} & \multicolumn{1}{p{3em}}{\textcolor[rgb]{ .898,  .231,  .318}{\textbf{Heart rate}}} \\ \textcolor[rgb]{ .2,  .2,  .2}{100\%} & \textcolor[rgb]{ .2,  .2,  .2}{8:32} & \textcolor[rgb]{ .118,  .118,  .118}{\blacksmiley{} } & \textcolor[rgb]{ .2,  .2,  .2}{59} \\ \textcolor[rgb]{ .2,  .2,  .2}{3\%} & \textcolor[rgb]{ .2,  .2,  .2}{0:16} & \textcolor[rgb]{ .118,  .118,  .118}{\frownie{}} & \textcolor[rgb]{ .2,  .2,  .2}{72} \\ \textcolor[rgb]{ .2,  .2,  .2}{98\%} & \textcolor[rgb]{ .2,  .2,  .2}{8:30} & \textcolor[rgb]{ .118,  .118,  .118}{\frownie{}} & \textcolor[rgb]{ .2,  .2,  .2}{57} \\ \textcolor[rgb]{ .2,  .2,  .2}{97\%} & \textcolor[rgb]{ .2,  .2,  .2}{9:06} & \textcolor[rgb]{ .118,  .118,  .118}{\blacksmiley{} } & \multicolumn{1}{p{3em}}{\textcolor[rgb]{ .2,  .2,  .2}{NaN}} \\ \textcolor[rgb]{ .2,  .2,  .2}{72\%} & \textcolor[rgb]{ .2,  .2,  .2}{6:44} & \textcolor[rgb]{ .118,  .118,  .118}{\blacksmiley{} } & \textcolor[rgb]{ .2,  .2,  .2}{68} \\ \textcolor[rgb]{ .2,  .2,  .2}{83\%} & \textcolor[rgb]{ .2,  .2,  .2}{7:12} & \textcolor[rgb]{ .118,  .118,  .118}{\blacksmiley{} } & \textcolor[rgb]{ .2,  .2,  .2}{60} \\ \textcolor[rgb]{ .2,  .2,  .2}{78\%} & \textcolor[rgb]{ .2,  .2,  .2}{7:18} & \textcolor[rgb]{ .118,  .118,  .118}{\blacksmiley{} } & \textcolor[rgb]{ .2,  .2,  .2}{57} \\ \textcolor[rgb]{ .2,  .2,  .2}{69\%} & \textcolor[rgb]{ .2,  .2,  .2}{7:27} & \textcolor[rgb]{ .118,  .118,  .118}{\blacksmiley{} } & \textcolor[rgb]{ .2,  .2,  .2}{56} \\ \textcolor[rgb]{ .2,  .2,  .2}{74\%} & \textcolor[rgb]{ .2,  .2,  .2}{7:35} & \textcolor[rgb]{ .118,  .118,  .118}{\frownie{}} & \textcolor[rgb]{ .2,  .2,  .2}{64} \\ \textcolor[rgb]{ .2,  .2,  .2}{81\%} & \textcolor[rgb]{ .2,  .2,  .2}{9:19} & \textcolor[rgb]{ .118,  .118,  .118}{\blacksmiley{} } & \textcolor[rgb]{ .2,  .2,  .2}{62} \\ \end{tabular}% \end{equation*}

Boosting

Gradient boost for regression

The loss function of gradient boost defined as

(2)   \begin{equation*} \mathcal{L}^{(t)}  =\underbrace{ \sum_{i=1}^n l(\overbrace{y_i}^{\text{label}},  \underbrace{ \hat{y_i}^{(t-1)} + f_t(x_i) }_{(z_0+\Delta{z}) \text{ where } z_0 = \hat{y_i}^{(t-1)} \text{ and } \Delta{z} = f_t(x_i) } ) }_{\mathcal{L}^{(t)}(z_0+\Delta{z})} + \cancelto{\text{ignore now}}{\Omega(f_t)}. \end{equation*}

Gradient boost for classification

Caveat To avoid heavy notation, we ignored summation \sum symbol.

For a binary classificaton problem, we can define odds as

(3)   \begin{equation*} \text{odds} = \frac{\text{win}}{\text{lose}} \end{equation*}

and probability as

(4)   \begin{equation*} p = \frac{\text{win}}{\text{win}+\text{lose}}. \end{equation*}

You might wonder why we define this. In the following developments, you will find this definition will make the result be consistent with regression.

With some simple algebra,

(5)   \begin{equation*} p = \frac{\text{odds}}{1+\text{odds}} =\frac{ e^{\log{(\text{odds})}} }{ 1 + e^{\log{(\text{odds})}} }. \end{equation*}

We can define our loss function as cross entropy, such that

(6)   \begin{equation*} \begin{aligned} \mathcal{L}(y, F_{m-1} + \gamma)  = & -(y \log(p) + (1-y) \log(1-p) \\ = & -( y(\log(p) - log(1-p)) + \log(1-p)) \\ = & -( y\log{\frac{p}{1-p}} ) - \log(1+\text{odds}) \\ = & -[ y \log(\text{odds}) - \log(1+e^{\log(\text{odds})}) ] , \end{aligned} \end{equation*}

in which

(7)   \begin{equation*} \log(\text{odds}) = F_m = {F_{m-1} + \gamma}. \end{equation*}

We want to find \gamma which can minimize the loss, in symbol,

(8)   \begin{equation*} \underset{\gamma}{\text{arg\,max\;}} \mathcal{L}. \end{equation*}

We could directly work on Equation 6 with gradient descent or closed-form solution, such that

(9)   \begin{equation*} \dv{\mathcal{L}}{\gamma}=0 \end{equation*}

However, this will be quite complex.

We can use Taylor series to approximate the loss fucntion, you should convince yourself this will make things simpler, such that

(10)   \begin{equation*} \mathcal{L}(y, F_{m-1}+\gamma) \approx \mathcal{L}(y, F_{m-1}) + \dv{\mathcal{L}}{F} \gamma + \dv[2]{\mathcal{L}}{F} \gamma^2 \end{equation*}

Caveat Caveat Two kinds of derivatives of \mathcal{L} appeared here, one is w.r.t. F and one is w.r.t. \gamma.

With Equation 9, and set

(11)   \begin{equation*} \dv{\mathcal{L}}{\gamma} = \dv{}{\gamma} {\Bigg[ \mathcal{L}(y, F_{m-1}) + \dv{\mathcal{L}}{F} \gamma + \dv[2]{\mathcal{L}}{F} \gamma^2 \Bigg]} = 0 \end{equation*}

\gamma can be solved that

(12)   \begin{equation*} \gamma = \frac{ \dv{\mathcal{L}}{F} }{ \dv[2]{L}{F} } \end{equation*}

With Equation 6, the first order derivative with respect to F can be calculated as

(13)   \begin{equation*} \dv{\mathcal{L}}{\log(\text{odds})} = - (y - \frac{e^{\log(\text{odds})}}{1 + e^{\log(\text{odds})}}) = -{(y-p)} \\ \end{equation*}

with some illustration

Rendered by QuickLaTeX.com

The second derivative of \mathcal{L} with respect to \log(\text{odds}) is

(14)   \begin{equation*} \begin{aligned} \dv[2]{\mathcal{L}}{\log(\text{odds})} = & \dv[2]{ \frac{e^{\log(\text{odds})}}{1 + e^{\log(\text{odds})}} } {\log(\text{odds})} \\ = & \frac{e^{\log(\text{odds})}(1 + e^{\log(\text{odds})}) - e^{2\log(\text{odds})}}{ (1 + e^{\log(\text{odds})})^2 } \\ = & \frac{1}{(1 + e^{\log(\text{odds})})} \frac{e^{\log(\text{odds})}}{ (1 + e^{\log(\text{odds})}) } \\ = & p(1 - p) \end{aligned}. \end{equation*}

XGBoost

Reference

  1. 1.
    Dana D. Sleep Data Personal Sleep Data from Sleep Cycle iOS App. Kaggle. https://www.kaggle.com/danagerous/sleep-data#

Footnotes

There are many excellent tutorials out there. Some tutorials are too intuitive and it's helpful, but you cannot get it straight on the math details. Some focused on dymestifying math. Some focused on code. I found the best tutorials that give you the conceptual ideas and are possible for implementation without being blind to the math details. Drop a comment if I failed. It would be really appreciable.


If you want to cite this article, please cite this article as:

Lachlan Chen, "Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably," in EarnFromScratch, September 8, 2020, https://www.earnfs.com/en/html/2180.htm.

or

@misc{lachlanchen2020tutorial,
title=Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably,
author={Chen, Lachlan},
year=September 8, 2020
}


EarnFromScratch (September 29, 2020) Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably. Retrieved from https://www.earnfs.com/en/html/2180.htm.
"Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably." EarnFromScratch - September 29, 2020, https://www.earnfs.com/en/html/2180.htm
EarnFromScratch September 8, 2020 Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably., viewed September 29, 2020,<https://www.earnfs.com/en/html/2180.htm>
EarnFromScratch - Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably. [Internet]. [Accessed September 29, 2020]. Available from: https://www.earnfs.com/en/html/2180.htm
"Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably." EarnFromScratch - Accessed September 29, 2020. https://www.earnfs.com/en/html/2180.htm
"Hunt the papertiger from boosting to XGBoost, intuitively, mathematically, implementably." EarnFromScratch [Online]. Available: https://www.earnfs.com/en/html/2180.htm. [Accessed: September 29, 2020]


Leave a Reply