- As is described in chapter 5, various method can be used to estimate test error rate
- This notes would covers The validation set approach, LOOCV,k-Fold CV, Bootstrap
The Validation Set Approach
Divide the Subset
1 | library(ISLR) |
Fit the Model
1 | lm.fit=lm(mpg~horsepower,data=Auto,subset=train) |
Get MSE
1 | attach(Auto) |
Fit the Polynomial Model
1 | lm.fit2 = lm(mpg~poly(horsepower,2),data = Auto,subset=train) |
Alter the Training Set
1 | set.seed(2) |
LOOCV
Perform Regression
- Perform linear regression using
glm
1 | glm.fit = glm(mpg~horsepower,data=Auto) |
Get LOOCV Statistic
1 | library(boot) |
- The test error get above is about 24.23
- The first is the standard CV estimate, while the other is a Bias corrected version
1 | cv.error = rep(0,5) |
- The result is:
24.23151 19.24821 19.33498 19.42443 19.03321
k-Fold Test
1 | set.seed(17) |
The Bootstrap
Procedure
Writing a function calculating the estimator
boot() from library boot to perform bootstrap
Write the function:
1 | > alpha.fn = function(data,index){ |
Use Sample()
to Resample
1 | set.seed(1) |
Use boot()
to Simplify the Process
1 | boot(Portfolio,alpha.fn,R=1000) |
Estimate the Accuracy of an LR
Write Function
1 | > boot.fn = function(data,index) |
All Observations:
1 | boot.fn(Auto,1:392) |
Random Sampling:
1 | boot.fn(Auto,sample(392,392,replace=T)) |
- Bootstrap:
boot(Auto,boot.fn,1000)
Compare the Result
- Because the formula to compute $SE(\hat{\beta}_0)$ depends on some assumptions but bootstrap does not.
- So Bootstrap is a better estimate