TU Wien:Machine Learning VU (Musliu)/Exam 2022-01-28

Single Choice

Classification is a machine learning task where the target attribute is nominal

Demission trees can handle only binary classification problems

The error of a 1-NN classifier on the training set is 0

A softmax function in MLPs transforms the activation to a range of -1…1

Macro-averaging for classifier evaluation first calculates accuracy/precision/recall/… per class, before averaging across classes

The paired t-test is used when testing for statistical significance of results obtained with holdout validation

The paired t-test is used when testing for statistical significance of results obtained with cross validation

In a dataset the entropy is lowest when all classes have the same amount of samples

In a dataset the entropy is highest when all classes have the same amount of samples

In AdaBoost, the weights are randomly initialised

Support Vector Machines always finds a more optimal decision boundary (hyperplane) than Perceptrons

Support Vector Machines with a linear kernel are particularly suitable for classification of very high dimensional, sparse data

Support Vector Machines can by default only solve binary classification problems

If Naive Bayer is applied on a data set that contains also numeric attributes then a probability density function must always be used

Model based features used for metaleanring are extracted directly from the data set

Majority voting is not used when k-nn is applied for linear regression

For the Monte Carlo method in reinforcement learning value estimates and policies are changed only on the completion of an episode

Gradient descent is always more efficient than Normal Equation (analytical approach) for linear regression

Information gain is an unsupervised feature selection method

Feature selection is primarly useful to improve the effectiveness of machine learning

Ordinal data does not allow distances to be computed between data points

The first model in gradient boosting is a zero rule model

PCA is a supervised feature selection method

Open Choice Can kernel methods also be applied to the perceptron classifier (also discuss why or why not!)

What is polynomial regression? Which are advantages/disadvantages of polynomial regression compared to linear regression?

When are two nodes in a Bayesian Network d-seperated?

Which features are used in metalearning? What are landmarking features?

What are the difference between micro and macro averaged performance measures?

Given are 1000 observations, from which you want to train a decision tree. As pre-pruning the following parameters ares set: - The minimum number of observations required to split a node is set to 200 - The minimum leaf size (number of observations) to 300 Then, what would be the maximum depth a decision tree can take (not counting the root node)? Explain you answer!

Describe a local search algorithm for Bayesian Network creation.

Goal and settings of classification. To what tasks does it relate and from which it differs in machine learning ?

What methods are there for combatting overfitting in Neural Networks?

Describe 3 methods to compute the error in regression. (approximate question)

Something about regularisations z-score and min-max. What are they, when they are useful and on which type of features.

Something about explaining the epsilon-greedy selection for the k-armed Bandit Problem.

TU Wien:Machine Learning VU (Musliu)/Exam 2022-01-28

Navigationsmenü