TU Wien:Multivariate Statistik VO (Filzmoser)/Multivariate Statistics Possible Exam Questions

Why do we need multivariate statistics?[Bearbeiten | Quelltext bearbeiten]

What is the Spectral Decomposition Theorem?[Bearbeiten | Quelltext bearbeiten]

What is the density of the multivariate normal distribution?[Bearbeiten | Quelltext bearbeiten]

Name distances for clustering, methods, and their respective objective functions/criteria.[Bearbeiten | Quelltext bearbeiten]

Explain model-based clustering and difficulties that could occur.[Bearbeiten | Quelltext bearbeiten]

Note: We have to estimate k multivariate normal distribution covariance structures. That can potentially be a lot of parameters to estimate and lead to instability in the estimates. There are some assumptions that make the model simpler.

Explain fuzzy clustering, what is the objective function?[Bearbeiten | Quelltext bearbeiten]

How can we evaluate clustering solutions -- Hetero/Homogeneity, Calinski-Harabasz, Hartigan, silhouette width, Gap statistic (principles)?[Bearbeiten | Quelltext bearbeiten]

What is the least squares estimator?[Bearbeiten | Quelltext bearbeiten]

How does multivariate linear regression work, what is the objective function, solution and appropriate inference (estimation of covariance of errors). Basic model selection?[Bearbeiten | Quelltext bearbeiten]

What are problems with non-robustness? How is it connected to the (empirical) influence function, maxbias curve, breakdown point and efficiency?[Bearbeiten | Quelltext bearbeiten]

What are M-estimators? What are the M-estimating equations? Why is it a weighted least squares estimator?[Bearbeiten | Quelltext bearbeiten]

What are S-estimators? What is the MM-estimator and the properties inherited from S- and M-estimators?[Bearbeiten | Quelltext bearbeiten]

Define affine equivariance.[Bearbeiten | Quelltext bearbeiten]

What is the Minimum Covariance Determinant estimator? How does in work (in concept)? What properties does it have (related to tuning parameter h)?[Bearbeiten | Quelltext bearbeiten]

What are problems in classic regression diagnostics (hat matrix)? What are robust regression diagnostics?[Bearbeiten | Quelltext bearbeiten]

How does robust multivariate regression work? (estimate covariance matrix with M-estimator of scale)[Bearbeiten | Quelltext bearbeiten]

Principal Component Analysis - how to select the vectors for the transformation, Lagrange problem definition.[Bearbeiten | Quelltext bearbeiten]

What is the expectation of the Principal Components?[Bearbeiten | Quelltext bearbeiten]

Why is PCA sensitive to scale? What happens if we center-scale the data?[Bearbeiten | Quelltext bearbeiten]

What are some rules for the number of principal components to select? (for hypothesis tests: only concept, not formulas)[Bearbeiten | Quelltext bearbeiten]

What is singular value decomposition, how is it defined, and how is it related to PCA? What are the scores in terms of SVD? When would we prefer SVD to spectral decomposition of the covariance (correlation) matrix?[Bearbeiten | Quelltext bearbeiten]

Let $X$ be a mean-centered matrix (columns have mean 0).

Then there exists an orthogonal $(n\times n)$ matrix $U$ and an orthogonal $(p\times p)$ matrix $V$ such that

$X=UDV^{\top }$

where $D$ is an $(n\times p)$ "diagonal" matrix i.e. the only non-zero values are $d_{ii},i=1,\dots ,\min(n,p)$ . The "diagonal" elements of $D$ are called singular values of $X$ .

We can show that

$X^{\top }X=VDU^{\top }UDV^{\top }=VD\mathbf {I} DV^{\top }=VD^{2}V^{\top }$ ,

which means the columns of $V$ are the eigenvectors of $X^{\top }X$ with eigenvalues $D^{2}$ . Furthermore, it holds that the covariance matrix $S={\frac {1}{n-1}}X^{\top }X$ , because $X$ is mean-centered. We know that in PCA,

$S={\hat {\Gamma }}{\hat {A}}{\hat {\Gamma }}^{\top }={\frac {1}{n-1}}X^{\top }X$

Therefore, ${\hat {\Gamma }}\equiv V$ and $(n-1){\hat {A}}=D^{2}$ . Hence, for the scores we obtain $Z=(X-\mathbf {0} ){\hat {\Gamma }}=X{\hat {\Gamma }}=XV=UDV^{\top }V=UD$ .

SVD is preferable when $n<p$ , which means we have more features than observations. In that case, the covariance matrix would be non-singular and the spectral decomposition theorem would not be applicable.

How can we define the PCA problem in terms of reconstruction error (Frobenius norm)?[Bearbeiten | Quelltext bearbeiten]

Note: I got this question in my exam. Only the definition was enough with a natural language explanation.

What are Biplots? What is the rank-2 approximation? Define the G/H matrix. What are the properties of the biplot? (inner row product of G and H approximates elements of the X-matrix, etc.)[Bearbeiten | Quelltext bearbeiten]

Which diagnostics do we have for PCA (formal definition of orthogonal, score distance)?[Bearbeiten | Quelltext bearbeiten]

What is the factor analysis model (formal definition, assumptions)? What is the difference to PCA?[Bearbeiten | Quelltext bearbeiten]

Explain the decomposition of the correlation matrix in factor analysis.[Bearbeiten | Quelltext bearbeiten]

What are the uniquenesses and communalities, how are they defined, what is their meaning and how are they related?[Bearbeiten | Quelltext bearbeiten]

What is the maximum number of factors we can include in the factor model, and why?[Bearbeiten | Quelltext bearbeiten]

How can we estimate the communalities and loadings (PFA)?[Bearbeiten | Quelltext bearbeiten]

How can we interpret factors? Give an overview of factor rotation criteria.[Bearbeiten | Quelltext bearbeiten]

How are factor scores estimated (Bartlett and Regression method). Name the models, formulas and solutions for the factor scores estimates.[Bearbeiten | Quelltext bearbeiten]

What is the problem setting in multiple correlation analysis? What is the objective function to minimize?[Bearbeiten | Quelltext bearbeiten]

What is the linear prediction function in multiple correlation analysis? Describe the structure of the proof.[Bearbeiten | Quelltext bearbeiten]

Name a hypothesis test for the multiple correlation coefficient.[Bearbeiten | Quelltext bearbeiten]

What is the problem setting in canonical correlation, what is the maximization problem?[Bearbeiten | Quelltext bearbeiten]

How do we get the linear combinations for canonical correlation? Why is a matrix product and Eigenvector/Eigenvalue problem involved?[Bearbeiten | Quelltext bearbeiten]

What happens if there is the same variable in X and Y in canonical correlation?[Bearbeiten | Quelltext bearbeiten]

The first canonical correlation coefficient is 1.

What are some hypothesis tests in canonical correlation analysis? (WS20: focus on permutation test)[Bearbeiten | Quelltext bearbeiten]

What is the goal of discriminant analysis? What is the expected cost of misclassification, what is involved? How can the ECM be minimized, and how do we arrive at those rules?[Bearbeiten | Quelltext bearbeiten]

Two-group case: What is linear discriminant analysis, what are the assumptions? How to arrive at the rule for classification? Why is it called linear discriminant analysis, and how do we estimate the involved components?[Bearbeiten | Quelltext bearbeiten]

Two-group case: Explain the downprojection in LDA (graphic would probably help), if the priors and costs are equal.[Bearbeiten | Quelltext bearbeiten]

Two-group case: Why is QDA called "quadratic"?[Bearbeiten | Quelltext bearbeiten]

Two-group case: What is the Fischer criterion to maximize? What is the solution for the projection vector? What is the relation to LDA?[Bearbeiten | Quelltext bearbeiten]

Extend the ECM to the multi-group case. What is the resulting decision rule if costs are equal? What are the discriminant functions?[Bearbeiten | Quelltext bearbeiten]