TU Wien:Business Intelligence VU (Rauber)/2024WS-Exam-Questions

T/F Questions:

The upcoming EU Regulation on AI obliges high-risk applications to log, amongst others, the identification of any person involved in the verification of results.
Representation Bias is closely linked to encoding of data, especially the value ranges selected for encoding an attribute.
Historical Bias manifests via the data generation process even given a perfect sampling and feature selection.
The CRISP-DM process starts with the Data Understanding phase and ends with the Evaluation phase.
Data exploration and preprocessing are needed also in data mining projects where the data comes from an integrated data warehouse.
Data augmentation is a pre-processing step aiming at improving the quality and quantity of data.
Minor shifts in single data points can lead to completely different splits at specific levels of decision trees.
Binary data is a special form of nominal data.
Ordinal data does not allow distances to be computed between data instances.
Support Vector Machines with soft margin usually will show a better effectiveness on the test data than Support Vector Machines without soft margin.
The optimal hyper-parameters of a model for deployment are determined on the validation set.
Recall measures the fraction of all positive instances correctly identified by a classifier.
Micro averaged Precision / Recall for classifier evaluation first calculates Precision / Recall per class and then averages across classes.
Equal Opportunity describes the characteristics of classifiers to have the same true positive rate across different groups, having and not having a protected attribute in both instances.
If two classifiers show statistically significant different performance, one can conclude a better performance of these classifiers.
Drilling down means going from a coarser (higher) level of aggregation to a finer (more detailed) view
The multidimensional model is on the conceptual level and not on the physical level.
A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data.
In the multidimensional model, data is divided into facts and cubes.
OLAP focuses on many short and small transactions.
A data mart is an equivalent alternative to a data warehouse.
The cube operator is used to obtain a slice from a data cube.
It is not possible to use a dimension multiple times within a data warehouse schema.
MDX queries explore specific axes (columns, rows, pages, ...)?
Data lakes are superior to data warehouses.
A Galaxy Schema contains multiple fact tables.
In the data Warehouse reference architecture, data is copied from source directly?
Fact tables are usually having many rows.
Data Using MOLAP and ROLAP is stored in exactly the same way.
Snowflake Schemas result in (small) savings in storage space.

Open Questions:

1. Describe the concept of risk categories used in the EU AI Act:

a) Which risk categories does it define?
b) What are the according consequences for applications falling in the respective categories?
c) Provide two examples of AI applications for each of these categories.

2. Lazy Learning:

a) Explain the concept of Lazy Learning.
b) Describe the characteristics of a setting in which a lazy learner would be the selected choice.
c) Provide a justification for your decision.
d) Give an example of such a setting.

3. Describe the functionality of the following operations as well as their relationships (similarities, differences, etc.):

GROUPING SETS, ROLLUP and CUBE

4. Describe at least 3 main components of the data warehouse reference architecture and their relationships.

TU Wien:Business Intelligence VU (Rauber)/2024WS-Exam-Questions

Navigationsmenü