TU Wien:Business Intelligence VU (Rauber)/2025WS-Exam-Questions 08.01.2025

T/F Questions:

Representation Bias is closely linked to encoding of data, especially the value ranges selected for encoding an attribute.
Historical Bias manifests via the data generation process even given a perfect sampling and feature selection.
The CRISP-DM process starts with the Data Understanding phase and ends with the Evaluation phase.
Data exploration and preprocessing are needed also in data mining projects where the data comes from an integrated data warehouse.
Data augmentation is a pre-processing step aiming at improving the quality and quantity of data.
Support Vector Machines with soft margin usually will show a better effectiveness on the test data than Support Vector Machines without soft margin.
The optimal hyper-parameters of a model for deployment are determined on the validation set.
Recall measures the fraction of all positive instances correctly identified by a classifier.
In the multidimensional model, data is divided into facts and records.
Data lakes are superior to data warehouses.
A Galaxy Schema contains multiple fact tables.

Open Questions:

Describe at least 3 main components of the data warehouse reference architecture and their relationships.
Write the 6 CRISP-DM phases. Explain what happens in phase 1 and what happens in phase 6.
What is the micro and macro approach for recall and precision measures. Define them for both cases and how they work. What works best for each case and when to choose one over the other.

Navigationsmenü