As is described in chapter 5, various method can be used to estimate test error rate
This notes would covers The validation set approach, LOOCV,k-Fold CV, Bootstrap

The Validation Set Approach

Divide the Subset

1
2
3

library(ISLR)
set.seed(1)
train=sample(392,196)

Fit the Model

1	lm.fit=lm(mpg~horsepower,data=Auto,subset=train)

Get MSE

1 2	attach(Auto) mean((mpg-predict(lm.fit,Auto))[-train]^2) #The reason why auto is inputed: including the validation set

Fit the Polynomial Model

lm.fit2 = lm(mpg~poly(horsepower,2),data = Auto,subset=train)
mean((mpg-predict(lm.fit2,Auto))[-train]^2)

lm.fit3 = lm(mpg~poly(horsepower,3),data = Auto,subset=train)
mean((mpg-predict(lm.fit3,Auto))[-train]^2)

Alter the Training Set

set.seed(2)
train = sample(392,196)

lm.fit=lm(mpg~horsepower,data=Auto,subset=train)
mean((mpg-predict(lm.fit,Auto))[-train]^2)
lm.fit2 = lm(mpg~poly(horsepower,2),data = Auto,subset=train)
mean((mpg-predict(lm.fit2,Auto))[-train]^2)
lm.fit3 = lm(mpg~poly(horsepower,3),data = Auto,subset=train)
mean((mpg-predict(lm.fit3,Auto))[-train]^2)

LOOCV

Perform Regression

Perform linear regression using glm

1	glm.fit = glm(mpg~horsepower,data=Auto)

Get LOOCV Statistic

1
2
3

library(boot)
cv.err = cv.glm(Auto,glm.fit)
cv.err$delta

The test error get above is about 24.23
The first is the standard CV estimate, while the other is a Bias corrected version

cv.error = rep(0,5)
for (i in 1:5){
    glm.fit = glm(mpg~poly(horsepower,i),data=Auto)
    cv.error[i] = cv.glm(Auto,glm.fit)$delta[1]
}
cv.error

The result is: 24.23151 19.24821 19.33498 19.42443 19.03321

k-Fold Test

set.seed(17)
cv.error.10 = rep(0,10)
for (i in 1:10){
    glm.fit=glm(mpg~poly(horsepower,i),data=Auto)
    cv.error.10[i] = cv.glm(Auto,glm.fit,K=10)$delta[1]
}
cv.error.10

The Bootstrap

Procedure

Writing a function calculating the estimator
boot() from library boot to perform bootstrap

Write the function:

> alpha.fn = function(data,index){
+ X=data$X[index]
+ Y=data$Y[index]
+ return((var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y)))
+ }

alpha.fn(Portfolia,1:100) #GET THE RESULT FOR ALL 100 Obervations

Use `Sample()` to Resample

1 2	set.seed(1) alpha.fn(Portfolio,sample(100,100,replace=T)) #replace: can repeat?,100: range, time

Use `boot()` to Simplify the Process

1	boot(Portfolio,alpha.fn,R=1000)

Estimate the Accuracy of an LR

Write Function

1 2	> boot.fn = function(data,index) + return(coef(lm(mpg~horsepower,data=data,subset=index)))

All Observations:

1	boot.fn(Auto,1:392)

Random Sampling:

1 2	boot.fn(Auto,sample(392,392,replace=T)) boot.fn(Auto,sample(392,392,replace=T))

Bootstrap:
boot(Auto,boot.fn,1000)

Compare the Result

Because the formula to compute $SE(\hat{\beta}_0)$ depends on some assumptions but bootstrap does not.
So Bootstrap is a better estimate

The Validation Set Approach

Divide the Subset

Fit the Model

Get MSE

Fit the Polynomial Model

Alter the Training Set

LOOCV

Perform Regression

Get LOOCV Statistic

k-Fold Test

The Bootstrap

Procedure

Write the function:

Use Sample() to Resample

Use boot() to Simplify the Process

Estimate the Accuracy of an LR

Write Function

All Observations:

Random Sampling:

Compare the Result

Use `Sample()` to Resample

Use `boot()` to Simplify the Process