TU Wien:Machine Learning VU (Musliu)/Exam 2021-06-24
Zur Navigation springen
Zur Suche springen
Multiple Choice (Answer: True/False)[Bearbeiten | Quelltext bearbeiten]
- K-d-Tree can be used as search space optimisation for k-NN
- Random Forests is a boosting ensemble technique - F
- Back propagation is a method for training Multi-Layer Perceptrons - T
- Ordinal data does not allow distances to be computed between data points
- In AdaBoost, the weights are uniformly initialised
- Suppose we have a neural network with ReLU activation function. Let's say, we replace ReLU activations by linear activations. Would this new neural network be able to approximate an XOR function? (Note: The neural network was able to approximate the XOR function with activation function ReLu)
- The entropy of a data set is based solely on the relative frequencies of the data distribution, not on the absolute number of data points present
- k-nearest neighbors is based on a supervised learning paradigm
- Support Vector Machines with a linear kernel are particularily suitable for classification of very high dimensional, sparse data - T
- Support Vector Machine can by default only solve binary classification problems
- Naive Bayes gives usually good results for regression data sets
- Learning the structure of Bayesian networks is usually simpler than learning the probabilities
- Learning the structure of Bayesian networks is usually more complicated than learning the probabilities
- The mean absolute error (a performance metric used for regression) is less sensitive to outliers than MSE
- Chain Rule simplifies calculation of probabilities in Bayesian Networks
- "Number of attributes of data set" is not a model based features that is used for metalearning - T
- Kernel projections can only be used in conjunction with support vector machines
- Suppose a convolutional neural network is trained on ImageNet dataset. This trained model is then given a completely white image as an input. The output probabilities for this input would be equal for all classes.
- When learning an SVM with gradient descent, it is guaranteed to find the globally optimal hyper plane. - F
- Usually state of the art AutoML systems use grid search to find best hyperparameters - T
- Linear regression converges when performed on linearly separable data - F
- Linear regression converges when performed on linearly not separable data - T
- Laplace Corrector must be used when using Naive Bayes - F
- Gradient boosting minimizes residual of previous classifiers - T
- Decision Trees using error rate vs entropy leads to different results - T
- Depth of decision tree can be larger than the number of training samples used to create a tree - F (depth of tree not larger than number of samples)
Free Text[Bearbeiten | Quelltext bearbeiten]
- Consider the following 2D data set. Which classifier(s) will acheive zero training error on this data set?
- o +
- + o
- Perceptron
- SVM with a linear kernel
- Decision tree (T)
- 1-NN classifier
- o +
- Describe at least three methods that are used for hyperparameter optimization
- How can we select automatically the most promising machine learning algorithm for a particular data set?
- Describe Rice's framework from the AutoML lecture
- Why a general Bayesian Network can give better results than Naive Bayes?
- What is overfitting, and when & why is it a problem? Explain measures against overfitting on an algorithm discussed in the lecture
- What is the difference between micro and macro averaged performance measures?
- Which are the important issues to consider when you apply Rice's framework for automated selection of machine learning algorithms.
- How can we avoid overfitting for polynomial regression?
- Which type of features are used in Metalearning ? What are landmarking features?
- something like: explain 3 regression performance valuation methods.
- something like: how is 1-R related to Decision tree
NOT EXACT, BUT SOMETHING LIKE THIS:
- In which order should the steps be when training neural network with gradient descent (and 5 options listed, should be place in correct order) :
- Initialize weights and bias
- Let input through the NN to get output
- Get error rate(compare what was expected to output or smthg like that)
- Adjust weights
- Reiterate until the best weights are in place
- Compare ridge and lasso regression
- Goal and settings of classification. To what tasks does it relate and from which it differs in machine learning ?
- When are two nodes in Bayesian network considered to be d-separated ?
- Can kernel be used in perceptron ?
- How can we automatically pick best algorithm for a specific dataset ?
- How can you learn the structure of Bayesian Networks?
- Explain how we can deal with missing values or zero frequency problem in Naive Bayes.
- Ignore the missing values / apply Laplace correction
- What is Deep Learning? Describe how it differs from "traditional" Machine Learning approaches? Name two application scenarios where Deep Learning has shown great advances over previous methods.
- Describe the goal and setting of classification. How does that relate and differ from other techniques in machine learning?
- compare it to unsupervised, name regression as a technique, etc.
- What is the randomness in random forests? Describe where in the algorithm randomness plays a role
- Describe 2 AutoML systems
- Describe in detail the algorithm to compute random forest
- Given are 1000 observations, from which you want to train a decision tree. As pre-pruning the following parameters are set :
- The minimum number of observations required to split a node is set to 200
- The minimum leaf size (number of obs.) to 300
Then, what would be the maximum depth a decision tree can take (not counting the root node) Explain your answer!
See image for more:
https://vowi.fsinf.at/wiki/Datei:TU_Wien-Machine_Learning_VU_(Mayer,_Musliu)_-_Exam_24062021.png