Note: We have to estimate k multivariate normal distribution covariance structures. That can potentially be a lot of parameters to estimate and lead to instability in the estimates. There are some assumptions that make the model simpler.
How can we evaluate clustering solutions -- Hetero/Homogeneity, Calinski-Harabasz, Hartigan, silhouette width, Gap statistic (principles)?[Bearbeiten | Quelltext bearbeiten]
How does multivariate linear regression work, what is the objective function, solution and appropriate inference (estimation of covariance of errors). Basic model selection?[Bearbeiten | Quelltext bearbeiten]
What are problems with non-robustness? How is it connected to the (empirical) influence function, maxbias curve, breakdown point and efficiency?[Bearbeiten | Quelltext bearbeiten]
What are M-estimators? What are the M-estimating equations? Why is it a weighted least squares estimator?[Bearbeiten | Quelltext bearbeiten]
What are S-estimators? What is the MM-estimator and the properties inherited from S- and M-estimators?[Bearbeiten | Quelltext bearbeiten]
What is the Minimum Covariance Determinant estimator? How does in work (in concept)? What properties does it have (related to tuning parameter h)?[Bearbeiten | Quelltext bearbeiten]
What are problems in classic regression diagnostics (hat matrix)? What are robust regression diagnostics?[Bearbeiten | Quelltext bearbeiten]
How does robust multivariate regression work? (estimate covariance matrix with M-estimator of scale)[Bearbeiten | Quelltext bearbeiten]
Principal Component Analysis - how to select the vectors for the transformation, Lagrange problem definition.[Bearbeiten | Quelltext bearbeiten]
What are some rules for the number of principal components to select? (for hypothesis tests: only concept, not formulas)[Bearbeiten | Quelltext bearbeiten]
What is singular value decomposition, how is it defined, and how is it related to PCA? What are the scores in terms of SVD? When would we prefer SVD to spectral decomposition of the covariance (correlation) matrix?[Bearbeiten | Quelltext bearbeiten]
Let be a mean-centered matrix (columns have mean 0).
Then there exists an orthogonal matrix and an orthogonal matrix such that
where is an "diagonal" matrix i.e. the only non-zero values are . The "diagonal" elements of are called singular values of .
We can show that
,
which means the columns of are the eigenvectors of with eigenvalues . Furthermore, it holds that the covariance matrix , because is mean-centered. We know that in PCA,
Therefore, and . Hence, for the scores we obtain .
SVD is preferable when , which means we have more features than observations. In that case, the covariance matrix would be non-singular and the spectral decomposition theorem would not be applicable.
Note: I got this question in my exam. Only the definition was enough with a natural language explanation.
What are Biplots? What is the rank-2 approximation? Define the G/H matrix. What are the properties of the biplot? (inner row product of G and H approximates elements of the X-matrix, etc.)[Bearbeiten | Quelltext bearbeiten]
Which diagnostics do we have for PCA (formal definition of orthogonal, score distance)?[Bearbeiten | Quelltext bearbeiten]
What is the factor analysis model (formal definition, assumptions)? What is the difference to PCA?[Bearbeiten | Quelltext bearbeiten]
How are factor scores estimated (Bartlett and Regression method). Name the models, formulas and solutions for the factor scores estimates.[Bearbeiten | Quelltext bearbeiten]
What is the problem setting in multiple correlation analysis? What is the objective function to minimize?[Bearbeiten | Quelltext bearbeiten]
What is the linear prediction function in multiple correlation analysis? Describe the structure of the proof.[Bearbeiten | Quelltext bearbeiten]
What is the problem setting in canonical correlation, what is the maximization problem?[Bearbeiten | Quelltext bearbeiten]
How do we get the linear combinations for canonical correlation? Why is a matrix product and Eigenvector/Eigenvalue problem involved?[Bearbeiten | Quelltext bearbeiten]
What are some hypothesis tests in canonical correlation analysis? (WS20: focus on permutation test)[Bearbeiten | Quelltext bearbeiten]
What is the goal of discriminant analysis? What is the expected cost of misclassification, what is involved? How can the ECM be minimized, and how do we arrive at those rules?[Bearbeiten | Quelltext bearbeiten]
Two-group case: What is linear discriminant analysis, what are the assumptions? How to arrive at the rule for classification? Why is it called linear discriminant analysis, and how do we estimate the involved components?[Bearbeiten | Quelltext bearbeiten]
Two-group case: Explain the downprojection in LDA (graphic would probably help), if the priors and costs are equal.[Bearbeiten | Quelltext bearbeiten]
Two-group case: What is the Fischer criterion to maximize? What is the solution for the projection vector? What is the relation to LDA?[Bearbeiten | Quelltext bearbeiten]
Extend the ECM to the multi-group case. What is the resulting decision rule if costs are equal? What are the discriminant functions?[Bearbeiten | Quelltext bearbeiten]