TU Wien:Business Intelligence VU (Tjoa)/Exam 2021-01-11

From VoWi
Jump to navigation Jump to search

The exam consisted of two parts. The exam was conducted in TUWEL (actually in 2021, title is wrong).

To attend the exam, you had to join a Zoom meeting with two devices, one filming from the front and one from the side/back. Students were let into the meeting one at a time. For 25 students, this process lasted about 1 hour and 10 minutes. Between the parts, there was a short break (bathroom allowed).

The exact grading scheme for the exam and scaling were unknown at the time of taking the exam. Results were expected at the end of January/beginning of February.

Part 1: SC (True-False)

20 true/false questions from a question pool ranging over all slides. Work time 20 minutes. Correct answer: +2 points, False answer: -1 point, No answer: 0 points.

  • Ordinal data allows distances
  • MapReduce consists in Map/Shift&Sort/Reduce phase
  • KNN with even value of K
  • IMPALA allows UPDATE & DELETE (?)
  • IMPALA symmetric node architecture?
  • Drill Down brings from a detailed to an aggregated view
  • Something about type of clustering
  • Data Mining goal success criteria can not be subjective
  • Hadoop is fault tolerant
  • Fayadd KDD focuses on Data Mining aspects
  • Fayyad KDD focuses on Deployment
  • Hadoop is for large streaming reads
  • A DWH is a subject-oriented, integrated, time-variant, nonvolatile collection of data
  • TDWI working definition of the DWH focuses on data integration and analysis and not on business decision support (something like that)
  • In a mediator integration approach everybody talks directly to everybody else
  • ETL only needed for integrated DWH
  • Single Linkage identifies contiguous clusters well
  • Complete Linkage identifies globular clusters well
  • Change of a single data point can change Decision Trees dramatically
  • A carefully pruned Decision tree usually has lower validation set error than an un-pruned one
  • Model for deployment is chosen by lowest training set error
  • Cross-validation is used for a more robust estimate for error

Part 2: Open Questions

Work time: 25 minutes.

The questions were selected from a question pool. Two questions each from the Data Warehousing/Data Mining part, meaning 4 questions in total. Average time to answer: 6.25 minutes, which is extremely short even for a TU exam.

Example questions:

Data Warehousing

  • Describe OLAP and OLTP and their characteristics
  • Describe the ETL process
  • Heterogeneity challenges integration
  • Star vs Snowflake
  • DW vs Lake
  • Inmon Approach pros/cons
  • Inmon Kimball pros/cons
  • Inmon/Kimball
  • Describe 2 scaling approaches their adv. and differences

Data Mining

  • What is a lazy learner? Example? When to use?
  • K-means algorithm, discuss robustness w.r.t. initialization
  • 1-n encoding and Binning (Explain, when to use, example)
  • KNN
  • describe and differences: a)define business criteria b) define business goals c) data mining success criteria d) data mining goals