TU Wien:Business Intelligence VU (Tjoa)/Exam 2021-01-11

Aus VoWi
Zur Navigation springen Zur Suche springen

The exam consisted of two parts. The exam was conducted in TUWEL.

To attend the exam, you had to join a Zoom meeting with two devices, one filming from the front and one from the side/back. Students were let into the meeting one at a time. For 25 students, this process lasted about 1 hour and 10 minutes. Between the parts, there was a short break (bathroom allowed).

The exact grading scheme for the exam and scaling were unknown at the time of taking the exam. Results were expected at the end of January/beginning of February.

Part 1: SC (True-False)[Bearbeiten | Quelltext bearbeiten]

20 true/false questions from a question pool ranging over all slides. Work time 20 minutes. Correct answer: +2 points, False answer: -1 point, No answer: 0 points.

  • Ordinal data allows distances
  • MapReduce consists in Map/Shift&Sort/Reduce phase
  • KNN with even value of K
  • IMPALA allows UPDATE & DELETE (?)
  • IMPALA symmetric node architecture?
  • Drill Down brings from a detailed to an aggregated view
  • Something about type of clustering
  • Data Mining goal success criteria can not be subjective
  • Hadoop is fault tolerant
  • Fayadd KDD focuses on Data Mining aspects
  • Fayyad KDD focuses on Deployment
  • Hadoop is for large streaming reads
  • A DWH is a subject-oriented, integrated, time-variant, nonvolatile collection of data
  • TDWI working definition of the DWH focuses on data integration and analysis and not on business decision support (something like that)
  • In a mediator integration approach everybody talks directly to everybody else
  • ETL only needed for integrated DWH
  • Single Linkage identifies contiguous clusters well
  • Complete Linkage identifies globular clusters well
  • Change of a single data point can change Decision Trees dramatically
  • A carefully pruned Decision tree usually has lower validation set error than an un-pruned one
  • Model for deployment is chosen by lowest training set error
  • Cross-validation is used for a more robust estimate for error

Part 2: Open Questions[Bearbeiten | Quelltext bearbeiten]

Work time: 25 minutes.

The questions were selected from a question pool. Two questions each from the Data Warehousing/Data Mining part, meaning 4 questions in total. Average time to answer: 6.25 minutes, which is extremely short even for a TU exam.

Example questions:

Data Warehousing[Bearbeiten | Quelltext bearbeiten]

  • Describe OLAP and OLTP and their characteristics
  • Describe the ETL process
  • Heterogeneity challenges integration
  • Star vs Snowflake
  • DW vs Lake
  • Inmon Approach pros/cons
  • Inmon Kimball pros/cons
  • Inmon/Kimball
  • Describe 2 scaling approaches their adv. and differences

Data Mining[Bearbeiten | Quelltext bearbeiten]

  • What is a lazy learner? Example? When to use?
  • K-means algorithm, discuss robustness w.r.t. initialization
  • 1-n encoding and Binning (Explain, when to use, example)
  • KNN
  • describe and differences: a)define business criteria b) define business goals c) data mining success criteria d) data mining goals