TU Wien:Business Intelligence VU (Tjoa)/Exam 2021-01-11
The exam consisted of two parts. The exam was conducted in TUWEL.
To attend the exam, you had to join a Zoom meeting with two devices, one filming from the front and one from the side/back. Students were let into the meeting one at a time. For 25 students, this process lasted about 1 hour and 10 minutes. Between the parts, there was a short break (bathroom allowed).
The exact grading scheme for the exam and scaling were unknown at the time of taking the exam. Results were expected at the end of January/beginning of February.
Part 1: SC (True-False)[Bearbeiten | Quelltext bearbeiten]
20 true/false questions from a question pool ranging over all slides. Work time 20 minutes. Correct answer: +2 points, False answer: -1 point, No answer: 0 points.
- Ordinal data allows distances
- MapReduce consists in Map/Shift&Sort/Reduce phase
- KNN with even value of K
- IMPALA allows UPDATE & DELETE (?)
- IMPALA symmetric node architecture?
- Drill Down brings from a detailed to an aggregated view
- Something about type of clustering
- Data Mining
goalsuccess criteria can not be subjective - Hadoop is fault tolerant
- Fayadd KDD focuses on Data Mining aspects
- Fayyad KDD focuses on Deployment
- Hadoop is for large streaming reads
- A DWH is a subject-oriented, integrated, time-variant, nonvolatile collection of data
- TDWI working definition of the DWH focuses on data integration and analysis and not on business decision support (something like that)
- In a mediator integration approach everybody talks directly to everybody else
- ETL only needed for integrated DWH
- Single Linkage identifies contiguous clusters well
- Complete Linkage identifies globular clusters well
- Change of a single data point can change Decision Trees dramatically
- A carefully pruned Decision tree usually has lower validation set error than an un-pruned one
- Model for deployment is chosen by lowest training set error
- Cross-validation is used for a more robust estimate for error
Part 2: Open Questions[Bearbeiten | Quelltext bearbeiten]
Work time: 25 minutes.
The questions were selected from a question pool. Two questions each from the Data Warehousing/Data Mining part, meaning 4 questions in total. Average time to answer: 6.25 minutes, which is extremely short even for a TU exam.
Example questions:
Data Warehousing[Bearbeiten | Quelltext bearbeiten]
- Describe OLAP and OLTP and their characteristics
- Describe the ETL process
- Heterogeneity challenges integration
- Star vs Snowflake
- DW vs Lake
- Inmon Approach pros/cons
- Inmon Kimball pros/cons
- Inmon/Kimball
- Describe 2 scaling approaches their adv. and differences
Data Mining[Bearbeiten | Quelltext bearbeiten]
- What is a lazy learner? Example? When to use?
- K-means algorithm, discuss robustness w.r.t. initialization
- 1-n encoding and Binning (Explain, when to use, example)
- KNN
- describe and differences: a)define business criteria b) define business goals c) data mining success criteria d) data mining goals