12 MLE
Recall that in this part of the text, we tackle the problem of constructing models (or updating beliefs) from data. Because data collection is also a source of uncertainty, we investigate how to appropariately propagate the uncertainty to the inferences (aka, answers to our questions) that we pulled from the models. Communicating the uncertainty associated with our answers will be key.
The goal of estimation is to find a probability model that describes the collected data so that, ultimately, we can answer questions about a larger population.
We noted that there are several ways to use the data to estimate a probability model.
Roadmap:
✓ Eyeball
✓ Kernel density estimation (nonparametric)
✓ Method of moments (parametric)
Today: Maximum likelihood (parametric)
Next: Bayesian posterior (parametric)
12.1 Key question
\[ \text{Suppose } X \sim F_X \text{ where } F_X \text{ is unknown.}\\ \text{How might one estimate } F_X \text{ from } \underbrace{\boldsymbol{X}_N}_{\text{data}} ?\]
12.1.1 Example
<- c(
data -1.26, 4.05, 0.37, -1.43, 3.33, -0.14, 4.81, 0.44, 1.62,
4.75, 3.13, 4.56, 0.20, 5.00, 0.47, -0.48, 4.14, 3.62,
0.70, 3.92, 0.84, 0.38, -0.87, 1.05, -3.69, 0.24, 1.47,
-5.37, 3.24, -6.76, -3.20, 3.84, 2.57, 0.05, -0.35, -2.77,
5.84, 2.98, 1.06, -0.20, 6.82, 4.52, -1.31, 0.74, -1.86,
0.17, 0.25, 1.19, -0.11, -0.22
)
12.1.1.1 ECDF
The ECDF is an expression for the CDF. It isn’t smooth.
<- ecdf(data)
fhat plot(fhat, main = parse(text="ECDF~hat(F)[X]"))
12.1.1.2 The histogram
The histogram is a type of estimate of the pdf. Again, it isn’t smooth.
<- hist(data, main = "", xlab = "X") h1
12.1.1.3 Kernel density estimate of CDF
The kernel density estimate can provide an expression of both the CDF and PDF. It is smooth. It is also nonparametric.
<- function(t, data, smooth){
ecdfstar outer(t, data, function(a,b){ pnorm(a, b, smooth)}) %>% rowMeans
}plot(ecdf(data), main = "")
curve(ecdfstar(x, data, smooth = .7), add = TRUE, lwd = 3, col = "blue")
12.1.1.4 Kernel density estimate of PDF
<- function(t, data, smooth){
epdfstar outer(t, data, function(a,b){ dnorm(a, b, smooth)}) %>% rowMeans
}hist(data, freq = FALSE, main = "", ylim=c(0,.2), breaks=15)
curve(epdfstar(x, data, smooth = .7), add = TRUE, lwd = 3, col = "blue")
# Perhaps easier code
<- density(data)
d1 hist(data, freq = FALSE, breaks=15)
lines(d1, lwd = 3, col = "blue")
12.1.1.5 This chapter
In this chapter, you will learn about maximum likelihood estimation (MLE) which is another way to estimate a distribution (both CDF and PDF) from data.
MLE is a very commonly used method.
12.2 The big idea behind MLE
ANSWER: Maximum likelihood
12.3 Maximum likelihood details
If the distribution of \(X\) is known, then \[ f_X(x |\ \theta) \] is the density function or the relative likelihood function for a single outcome.
When the observations of the sample are independent, the density function for a sample, \(\boldsymbol{X}_N\) is \[f_{\boldsymbol{X}_N}(\boldsymbol{x}|\ \theta) = \prod_{i=1}^N f_X(x_i|\ \theta)\]
In the usual setting, \(\theta\) is known. Now suppose that \(\boldsymbol{x}\) is known.
\[L(\theta|\ \boldsymbol{x}) = f_{\boldsymbol{X}_N}(\boldsymbol{x}|\ \theta)\]
This is the likelihood function. The maximum likelihood solution is the parameter, \(\theta\), which maximizes the function. One can use visualization tools, mathematics (calculus or linear algebra), or computational optimization algorithms to find a solution.
12.4 Example 2: Salary data
12.5 Example 3: Adult female height
12.5.1 Example 4: BMI
::getHdata(nhgh)
Hmisc<- nhgh$bmi bmi
For the sake of discussion, suppose that scale is known to be 1.7. (We’ll get to the multivariable setting later.)
<- function(shape, scale = 1.7, bmi){
l outer(shape, bmi, function(a,b)
dgamma(
x = b
shape = a
, scale = 1.7
, log = TRUE
,
)%>% rowSums
) }
curve(l(x, 1.7, bmi), 0, 45, ylab = "", main = "log Likelihood", xlab = "shape")
Note that the likelihood function is a function of the unknown parameter.
Strategy: Estimate the unknown parameter that maximizes the log likelihood.
Why does maximizing the likelihood make sense?
curve(l(x, 1.7, bmi), 0, 45, ylab = "", main = "log Likelihood", xlab = "shape")
<- seq(15,22,by = 0.01)
xxx # Poor mans maximization
<- l(xxx, 1.7, bmi) %>%
mle %>%
which.max `[`(xxx, .)
abline(v = mle, col = "blue", lwd = 3)
Will use the optim
or optimize
or mle
function for maximization.
Now consider MLE with the normal distribution.
require(stats4)
<- function(mean, sd){
nLL <- dnorm(
fs x = bmi
mean = mean
, sd = sd
, log = TRUE
,
) -sum(fs)
}<- mle(
fit
nLLstart = list(mean = 1, sd = 1)
, method = "L-BFGS-B"
, lower = c(0, 0.01)
,
)par(mfrow = c(1,2)); plot(profile(fit), absVal = FALSE)
par(mfrow = c(1,2))
plot(ecdf(bmi), main = "")
curve(
pnorm(x, coef(fit)[1], coef(fit)[2])
add = TRUE
, col = "blue"
, lwd = 3
,
)hist(bmi, freq = FALSE, main = "")
curve(
dnorm(x, coef(fit)[1], coef(fit)[2])
add = TRUE
, col = "blue"
, lwd = 3
, )
Can use lm
function to fit normal distribution, which is often easier.
lm(bmi ~ 1) %>% summary
Call:
lm(formula = bmi ~ 1)
Residuals:
Body Mass Index [kg/m^2]
Min 1Q Median 3Q Max
-15.142 -4.892 -1.032 3.558 56.548
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.32174 0.08431 335.9 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.95 on 6794 degrees of freedom
12.6 Hands-on
Estimate the distribution of arm length using the method of maximum likelihood. Generate plots of the estimated CDF and PDF.
::getHdata(nhgh) Hmisc