LDA attempts to express one dependent variable as a linear combination of other features or measurements.
LDA explicitly attempts to model the difference between the classes of data.
LDA for two classes
PROC DISCRIM DATA=iris;
CLASS Species;
Var X1-XK;
Var X1-XK;
RUN;
PROC DISCRIM DATA=Train TESTDATA=Test TESTOUT=Pred;
CLASS Species;
Var X1_XK;
RUN;
Difference btw LDA and logistic regression
PROC DISCRIM DATA=Train TESTDATA=Test TESTOUT=Pred;
CLASS Species;
Var X1_XK;
RUN;
- Multivariate normal distribution assumptions holds for the response variables. This means that each of the dependent variables is normally distributed within groups, that any linear combination of the dependent variables is normally distributed, and that all subsets of the variables must be multivariate normal.
- Each group must have a sufficiently large number of cases.
- Different classification methods may be used depending on whether the variance-covariance matrices are equal (or very similar) across groups.
- LDA operates by maximizing the log-likelihood based on an assumption of normality and homogeneity
- Logistic regression makes no assumption about Pr(X), and estimates the parameters of Pr(G|x) by maximizing the conditional likelihood
- Intuitively, it would seem that if the distribution of x is indeed multivariate normal, then we will be able to estimate our coefficients more efficiently by making use of that information by using LDA.
- On the other hand, logistic regression would presumably be more robust if LDA’s distributional assumptions are violated
QDA is assumed that the measurements from each class are normally distributed.When the normality assumption is true, the best possible test for the hypothesis that a given measurement is from a given class is the likelihood ratio test.
No comments:
Post a Comment