Wednesday, June 18, 2014

Logistic Regression with Penalty Factor in R

The main goal of penalty is to minimize the sum of the unpenalized cost function plus the penalty term, which can be understood as adding bias and preferring a simpler model to reduce the variance in the absence of sufficient training data to fit the model.

- The ridge penalty shrinks the coefficients of correlated predictors towards each other (L2)
- The lasso tends to pick one of them and discard the others (L1)
- The elastic-net penalty mixes these two.

If predictors are correlated in groups, an 

 is for numerical stability; for example, the elastic net with  for some small  performs much like the lasso, but removes any degeneracies and wild behavior caused by extreme correlations.

## http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html#log
install.packages("glmnet", repos = "http://cran.us.r-project.org")
require(glmnet)

load("BinomialExample.RData")
fit = glmnet(x, y, family = "binomial")
summary(fit)
plot(fit, xvar = "dev", label = TRUE)

predict(fit, newx = x[1:5,], type = "class", s = c(0.05, 0.01))
cvfit = cv.glmnet(x, y, family = "binomial", type.measure = "class")
plot(cvfit)
cvfit$lambda.min
cvfit$lambda.1se
coef(cvfit, s = "lambda.min")

predict(cvfit, newx = x[1:10,], s = "lambda.min", type = "class")

No comments:

Post a Comment