TU Wien:Machine Learning VU (Musliu)/Exam 2022-01-28
Single Choice
Classification is a machine learning task where the target attribute is nominal
Demission trees can handle only binary classification problems
The error of a 1-NN classifier on the training set is 0
A softmax function in MLPs transforms the activation to a range of -1…1
Macro-averaging for classifier evaluation first calculates accuracy/precision/recall/… per class, before averaging across classes
The paired t-test is used when testing for statistical significance of results obtained with holdout validation
The paired t-test is used when testing for statistical significance of results obtained with cross validation
In a dataset the entropy is lowest when all classes have the same amount of samples
In a dataset the entropy is highest when all classes have the same amount of samples
In AdaBoost, the weights are randomly initialised
Support Vector Machines always finds a more optimal decision boundary (hyperplane) than Perceptrons
Support Vector Machines with a linear kernel are particularly suitable for classification of very high dimensional, sparse data
Support Vector Machines can by default only solve binary classification problems
If Naive Bayer is applied on a data set that contains also numeric attributes then a probability density function must always be used
Model based features used for metaleanring are extracted directly from the data set
Majority voting is not used when k-nn is applied for linear regression
For the Monte Carlo method in reinforcement learning value estimates and policies are changed only on the completion of an episode
Gradient descent is always more efficient than Normal Equation (analytical approach) for linear regression
Information gain is an unsupervised feature selection method
Feature selection is primarly useful to improve the effectiveness of machine learning
Ordinal data does not allow distances to be computed between data points
The first model in gradient boosting is a zero rule model
PCA is a supervised feature selection method
Open Choice
Can kernel methods also be applied to the perceptron classifier (also discuss why or why not!)
What is polynomial regression? Which are advantages/disadvantages of polynomial regression compared to linear regression?
When are two nodes in a Bayesian Network d-seperated?
Which features are used in metalearning? What are landmarking features?
What are the difference between micro and macro averaged performance measures?
Given are 1000 observations, from which you want to train a decision tree. As pre-pruning the following parameters ares set: - The minimum number of observations required to split a node is set to 200 - The minimum leaf size (number of observations) to 300 Then, what would be the maximum depth a decision tree can take (not counting the root node)? Explain you answer!
Describe a local search algorithm for Bayesian Network creation.
Goal and settings of classification. To what tasks does it relate and from which it differs in machine learning ?
What methods are there for combatting overfitting in Neural Networks?
Describe 3 methods to compute the error in regression. (approximate question)
Something about regularisations z-score and min-max. What are they, when they are useful and on which type of features.
Something about explaining the epsilon-greedy selection for the k-armed Bandit Problem.