TU Wien:Business Intelligence VU (Rauber)/2024WS-Exam-Questions
Zur Navigation springen
Zur Suche springen
T/F Questions:
- The upcoming EU Regulation on AI obliges high-risk applications to log, amongst others, the identification of any person involved in the verification of results.
- Representation Bias is closely linked to encoding of data, especially the value ranges selected for encoding an attribute.
- Historical Bias manifests via the data generation process even given a perfect sampling and feature selection.
- The CRISP-DM process starts with the Data Understanding phase and ends with the Evaluation phase.
- Data exploration and preprocessing are needed also in data mining projects where the data comes from an integrated data warehouse.
- Data augmentation is a pre-processing step aiming at improving the quality and quantity of data.
- Minor shifts in single data points can lead to completely different splits at specific levels of decision trees.
- Binary data is a special form of nominal data.
- Ordinal data does not allow distances to be computed between data instances.
- Support Vector Machines with soft margin usually will show a better effectiveness on the test data than Support Vector Machines without soft margin.
- The optimal hyper-parameters of a model for deployment are determined on the validation set.
- Recall measures the fraction of all positive instances correctly identified by a classifier.
- Micro averaged Precision / Recall for classifier evaluation first calculates Precision / Recall per class and then averages across classes.
- Equal Opportunity describes the characteristics of classifiers to have the same true positive rate across different groups, having and not having a protected attribute in both instances.
- If two classifiers show statistically significant different performance, one can conclude a better performance of these classifiers.
- Drilling down means going from a coarser (higher) level of aggregation to a finer (more detailed) view
- The multidimensional model is on the conceptual level and not on the physical level.
- A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data.
- In the multidimensional model, data is divided into facts and cubes.
- OLAP focuses on many short and small transactions.
- A data mart is an equivalent alternative to a data warehouse.
- The cube operator is used to obtain a slice from a data cube.
- It is not possible to use a dimension multiple times within a data warehouse schema.
- MDX queries explore specific axes (columns, rows, pages, ...)?
- Data lakes are superior to data warehouses.
- A Galaxy Schema contains multiple fact tables.
- In the data Warehouse reference architecture, data is copied from source directly?
- Fact tables are usually having many rows.
- Data Using MOLAP and ROLAP is stored in exactly the same way.
- Snowflake Schemas result in (small) savings in storage space.
Open Questions:
1. Describe the concept of risk categories used in the EU AI Act:
- a) Which risk categories does it define?
- b) What are the according consequences for applications falling in the respective categories?
- c) Provide two examples of AI applications for each of these categories.
2. Lazy Learning:
- a) Explain the concept of Lazy Learning.
- b) Describe the characteristics of a setting in which a lazy learner would be the selected choice.
- c) Provide a justification for your decision.
- d) Give an example of such a setting.
3. Describe the functionality of the following operations as well as their relationships (similarities, differences, etc.):
- GROUPING SETS, ROLLUP and CUBE
4. Describe at least 3 main components of the data warehouse reference architecture and their relationships.