TU Wien:Business Intelligence VU (Tjoa)/Exam 2022-01-13

Aus VoWi
Zur Navigation springen Zur Suche springen

Exam 2022-01-13[Bearbeiten | Quelltext bearbeiten]

The exam consisted of two parts. The exam was conducted in TUWEL because of the corona pandemic.

To attend the exam, you had to join a Zoom meeting with two devices, one filming from the front and one from the side/back. Students were let into the meeting one at a time.

Part 1: 20 Single Choice (true/false) questions[Bearbeiten | Quelltext bearbeiten]

20 true/false questions from a question pool ranging over all slides. Work time 20 minutes. Correct answer: +2 points, False answer: -1 point, No answer: 0 points.

  1. The soft margin parameter of SVM controls the error on the training set.
  2. In the DWH reference architecture, the Staging Area is a database that stores a single data extract of a source database.
  3. The Staging Area in the DWH reference architecture is a database that stores a single data extract of a source database.
  4. Age of business intelligence starts at 2010.
  5. The processes, technologies, and tools needed to turn data into information, information into knowledge, and knowledge into plans that drive profitable business action.
  6. Approaches for Information Integration - Mediator means everybody talks directly to everyone else.
  7. Approaches for Information Integration - Federation connects multiple (heterogenous) data sources.
  8. A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data.
  9. You chose a model for deployment that has minimal training error.
  10. Hive allows real-time queries and has low-latency.
  11. The Kappa architecture is more complicated as the Lambda architecture.
  12. For data from a DWH you do not need data preparation and data exploration.
  13. F-score is a weighted score of Precision and Recall.
  14. ETL extraction monitoring strategies are Trigger-based, replication-based, timestamp-based, snapshot-based.
  15. Even a fully-grown decision tree can have impure leaf nodes.
  16. An advantage of Master-Slave Replication is high read-performance.

Part 2: 4 Open questions[Bearbeiten | Quelltext bearbeiten]

Work time: 25 minutes, 4 open questions. The questions were selected from a question pool. Average time to answer: 6.25 minutes, which is extremely short even for a TU exam.

  1. Explain main differences between Star schema and Snowflake schema.
  2. Explain Binning and 1-to-n encoding. Describe (a) when it is applied, (b) how it is applied, (c) give an example.
  3. Explain the steps of k-means algorithm. Explain the problem with initial centroid selection.
  4. Explain the three types of analytics.

Other possible questions - brain dump from other students, therefore some questions are repeating:

  1. defining precision, recall, micro/macro and explaining their differences
  2. defining business goal / success, mining goal / success and providing an example
  3. 3 analytics from dwh
  4. agile business intelligence
  1. defining precision, recall, micro/macro and explaining their differences
  2. What is a Lazy Learner? Give an example of a lazy learner. When is a lazy learner useful?
  3. What are the characteristics/requirements of OLTP Operational Systems?
  4. What is the difference between OLTP and OLAP?
  1. Discuss 2 different forms of heterogeneity
  2. 2 different scaling approaches, their characteristics and benefits
  1. two ways of doing integration in data warehousing
  2. k-means: algo and robustness wrt initialization
  3. precision recall micro macro, formula and comparison
  4. fasmi
  1. discuss 2 type of scaling in data preprocessing
  2. describe train test validation data, for what they are used
  3. describe three operation in MOLAP
  4. pro and cons of Kimball (bottom-down, this was written by the professor) approach
  1. defining precision, recall, micro/macro and explaining their differences
  2. 1-to-n and binning
  3. heterogenity concepts
  4. definition for DWH and 3 examples
  1. explain Lazy learning + example algorithm + example scenario
  2. 2 scaling approaches explained + when they are useful and when they are not
  3. 3 analytics from dwh
  4. advantages & Disadvantages of Kimball approach
  1. Difference between OLTP and OLAP and their main requirements
  2. Difference between Kimball model and Inmon model
  3. Describe two types of scaling
  1. types of scaling
  2. binning and 1-n encoding
  3. Solutions for information integration heterogeneity
  4. Snowflake vs star schema
  1. describe training, validation, test data
  2. describe 3 operations on multi dimensional data
  3. what is a lazy learner + example + when to use it
  4. definition of a DWH
  1. Describe metadata component in a DWH
  1. Inmon / Kimball
  2. explain MapReduce
  3. 1 to n binning and when to apply