TU Wien:Business Intelligence VU (Tjoa)/Exam 2021-01-25
The exam consisted of two parts. The exam was conducted in TUWEL.
To attend the exam, you had to join a Zoom meeting with two devices, one filming from the front and one from the side/back. Students were let into the meeting one at a time. Between the parts, there was a short break (bathroom allowed).
The exact grading scheme for the exam and scaling were unknown at the time of taking the exam. Results were expected at the end of January/beginning of February.
Part 1: SC (True-False)[Bearbeiten | Quelltext bearbeiten]
20 true/false questions from a question pool ranging over all slides. Work time 20 minutes. Correct answer: +2 points, False answer: -1 point, No answer: 0 points.
- K-means is extremely robust against outliers in the data
- Carefully pruned decision trees usually show higher precision on the training data than un-pruned decision trees
- Lazy Learning is not recommended when there is high drift in data space, leading to changing decision boundaries
- The knn classifier using Euclidean distance is computationally more expensive at the model building stage than a Decision Tree using simple error counts as splitting criterion.
- Ordinal data allows distances to be computed between data points
- Random sampling of time series data for classifier training may lead to an overestimation of model performance
- 1-to-N coding (one-hot encoding) reduces the dimensionality of the feature space
- CRISP-DM: Business Success Criteria are ideally specified as subjective measures and Data Mining Success Criteria should be specified as objective measures
- Zero-mean unit variance normalization is highly sensitive to outliers in the data
- Lazy learning is more time-efficient at classification stage
- According to the Data Warehousing Institute (TDWI) working definition, Business Intelligence encompasses analytic tools
- In Hadoop, applications are typically written in high-level code such as Java
- In Hadoop processing is coordinated through MapReduce.
- In context of the DWH reference architecture, the Metadata Component stores operational metadata, extraction and transformation metadata, and end-user metadata.
- The Staging Area in the DWH reference architecture is a database that stores a single data extract of a source database.
- [DWH] In a typical Lamba architecture, queries can be answered by merging results form a batch and real-time views.
- Data silos hold data for individual sets for applications or organization units.
- Big advantages of a snowflake schema include that the schema becomes more intuitive and browsing through the content is easy
- In DWH, the concept "warehouse" supports bi-directional data flows between related data sources.
- In context of DWH analytics, predictive analytics focus on investigating past effects to capture relevant information.
Part 2: Open Questions[Bearbeiten | Quelltext bearbeiten]
Work time: 25 minutes.
The questions were selected from a question pool. Two questions each from the Data Warehousing/Data Mining part, meaning 4 questions in total. Average time to answer: 6.25 minutes, which is extremely short even for a TU exam.
Example questions (analogously):
Data Warehousing[Bearbeiten | Quelltext bearbeiten]
- Name three requirements of peration systems using OLTP and discribe each of them briefly
- Discribe the 3 Steps of the MapReduce process flow
Data Mining[Bearbeiten | Quelltext bearbeiten]
- Name 4 types of attributes used in data mining, make an exampe, discribe the characteristics and the allowed mathematical oerpations
- Describe Single Linkage and Complete Linkage, how do the algorithms work and what are the characteristics