Roberto Santana and Unai Garciarena
Department of Computer Science and Artificial Intelligence
University of the Basque Country
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
Data | Model 1 | Model 2 | Model 3 |
---|---|---|---|
\(y x_1 x_2\) | \(p_{M1}(y|x_1,x_2) \) | \(p_{M2}(y|x_1,x_2) \) | \(p_{M3}(y|x_1,x_2) \) |
000 | 0.5 | 1.0 | 1.0 |
101 | 0.5 | 1.0 | 0.9 |
101 | 0.5 | 1.0 | 0.9 |
110 | 0.2 | 1.0 | 0.85 |
011 | 0.2 | 1.0 | 0.8 |
000 | 0.5 | 1.0 | 1.0 |
011 | 0.2 | 1.0 | 0.8 |
110 | 0.2 | 1.0 | 0.85 |
Data | Model 1 | Model 2 | Model 3 |
---|---|---|---|
\(y x_1 x_2\) | \(p_{M1}(y|x_1,x_2) \) | \(p_{M2}(y|x_1,x_2) \) | \(p_{M3}(y|x_1,x_2) \) |
000 | 0.5 | 1.0 | 1.0 |
101 | 0.5 | 1.0 | 0.9 |
101 | 0.5 | 1.0 | 0.9 |
110 | 0.2 | 1.0 | 0.85 |
011 | 0.2 | 1.0 | 0.8 |
000 | 0.5 | 1.0 | 1.0 |
011 | 0.2 | 1.0 | 0.8 |
110 | 0.2 | 1.0 | 0.85 |
\(\prod_{i=1}^{8} p(y|x_1,x_2) \) | \(0.2^4*0.5^4\) | 1.0 | \(0.85^2*0.8^2*0.9^2\) |
Data | Model 1 | Model 2 | Model 3 |
---|---|---|---|
\(y x_1 x_2\) | \(p_{M1}(y|x_1,x_2) \) | \(p_{M2}(y|x_1,x_2) \) | \(p_{M3}(y|x_1,x_2) \) |
000 | 0.5 | 1.0 | 1.0 |
101 | 0.5 | 1.0 | 0.9 |
101 | 0.5 | 1.0 | 0.9 |
110 | 0.2 | 1.0 | 0.85 |
011 | 0.2 | 1.0 | 0.8 |
000 | 0.5 | 1.0 | 1.0 |
011 | 0.2 | 1.0 | 0.8 |
110 | 0.2 | 1.0 | 0.85 |
\(\prod_{i=1}^{8} p(y|x_1,x_2) \) | \(0.2^4*0.5^4\) | 1.0 | \(0.85^2*0.8^2*0.9^2\) |
\(log(\prod_{i=1}^{8} p(y|x_1,x_2)) \) | \(log(0.2^4*0.5^4)\) | \(log(1.0)\) | \(\log(0.85^2*0.8^2*0.9^2 )\) |
\(\sum_{i=1}^{8} log(p(y|x_1,x_2)) \) | \(4*log(0.2)+4*log(0.5)\) | \(0 \) | \(2*\log(0.85)+2*log(0.8)+2*log(0.9)\) |
where the expectation is over the data (samples of the random variable).
\[ \begin{align} bias(\hat{\theta}_m) &=& \mathbb{E}(\hat{\theta}_m) -\theta \\ &=& \mathbb{E} \left [ \frac{1}{m} \sum_{i=1}^{m} x^i \right ] - \theta \\ &=& \frac{1}{m} \sum_{i=1}^{m} \mathbb{E} [x^i] - \theta \\ &=& \frac{1}{m} \sum_{i=1}^{m} \sum_{x^i=0}^{1} \left (x^i \theta^{x^i} (1-\theta)^{(1-x^i)} \right) - \theta \\ &=& \frac{1}{m} \sum_{i=1}^{m}(\theta) - \theta \\ &=& 0 \end{align} \]
It provides information about how much we expect the estimator to vary as a function of the data sample.
\[ \begin{align} Var(\hat{\theta}_m) &=& Var \left ( \frac{1}{m} \sum_{i=1}^{m} x^i \right ) \\ &=& \frac{1}{m^2} \sum_{i=1}^{m} Var (x^i) \\ &=& \frac{1}{m^2} \sum_{i=1}^{m} \theta (1-\theta) \\ &=& \frac{1}{m^2} m \theta (1-\theta) \\ &=& \frac{1}{m} \theta (1-\theta) \\ \end{align} \]
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
Data | Model 1 | Model 2 | Model 3 |
---|---|---|---|
\(y x_1 x_2\) | \(p_{M1}(y|x_1,x_2) \) | \(p_{M2}(y|x_1,x_2) \) | \(p_{M3}(y|x_1,x_2) \) |
000 | 0.5 | 1.0 | 1.0 |
101 | 0.5 | 1.0 | 0.9 |
101 | 0.5 | 1.0 | 0.9 |
110 | 0.2 | 1.0 | 0.85 |
011 | 0.2 | 1.0 | 0.8 |
000 | 0.5 | 1.0 | 1.0 |
011 | 0.2 | 1.0 | 0.8 |
110 | 0.2 | 1.0 | 0.85 |
\(\prod_{i=1}^{8} p(y|x_1,x_2) \) | \(0.2^4*0.5^4\) | 1.0 | \(0.85^2*0.8^2*0.9^2\) |
\(log(\prod_{i=1}^{8} p(y|x_1,x_2)) \) | \(log(0.2^4*0.5^4)\) | \(log(1.0)\) | \(\log(0.85^2*0.8^2*0.9^2 )\) |
\(\sum_{i=1}^{8} log(p(y|x_1,x_2)) \) | \(4*log(0.2)+4*log(0.5)\) | \(0 \) | \(2*\log(0.85)+2*log(0.8)+2*log(0.9)\) |
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
Maximizing the conditional likelihood \[ \sum_{i=1}^{m} log \; P(y^i|{\bf{x}}^i,\theta) = -m \, log(\sigma) - \frac{m}{2} log(2\pi) - \sum_{i=1}^{m} \frac{||\hat{y}^i-y^i||^2}{2\sigma^2} \]
is equivalent to minimizing the mean squared error: \[ MSE_{train} = \frac{1}{m} \sum_{i=1}^{m} ||\hat{y}^i-y^i||^2 \]
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
where \(\alpha \in [0,\infty] \) is a hyperparameter that weights the relative contribution of \(\Omega(\theta)\) relative to \( J(\theta;{\bf{X}},y) \).
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.
I. Goodfellow and Y. Bengio and A. Courville. Deep Learning.. MIT Press. 2016.