TU Wien:Computer Vision VU (Sablatnig)/Prüfung 2021-03-12

Image Classification and Machine Learning[Bearbeiten | Quelltext bearbeiten]

True/False Questions

On average, images of man-made environments have more vertical and horizontal edges than images of natural environments.
The bias and variance of a classifier are independent of each other.
One way of adding spatial information to the bag of words model is spatial pyramid matching.
Regularization in machine learning means to reduce the number of training samples.
Training of neural networks is only possible with linear activation functions.
The main advantage of the bag of words model is that it does not need training images

(A picture from the SIFT-Algorithm from the slides is given)
The following picture illustrates the process of computing SIFT keypoint descriptors.

What is shown on the left? What do the green grid, black arrows and blue circle mean?
The descriptor is illustrated on the right. What do the green grid and black arrows mean? What is the dimensionality of this descriptor?

What is Structure from Motion?
Given a fundamental matrix relating the images 1 and 2 and a point x in image 1, what can be said about the corresponding point in image 2?

Describe the principle of lenses (sketch) and the "thin lens" law!
Continuing from the question above, explain the Depth of Field
What are the internal camera parameters (sketch) and what influence do they have?

What is described by the plenoptic function?
Why does a sheet of paper in a scene not reflect an image of the scene? What simple construction can be used to let an image of a scene appear on a sheet of paper?

How are Gaussian Pyramids generated and what is their frequency composition on the different pyramid levels?
Following from the question above, what is the frequency composition of a Laplacian pyramid?

Describe how RANSAC can be used to determine the correct matches and the transformation between two views for image stitching

What are Scene Emergent Features?
What information is contained in the magnitudes of a Fourier spectrum and how can this information be used to differentiate between man-made and natural scenes?

How is the "cornerness" of an image pixel computed by the Harris corner detection algorithm?
How are the final corner points detected from the cornerness?

What is the difference between the "classical" machine learning pipeline for image recognition and the new deep learning methods?
Explain the concept and usage of auto-encoders.