TU Wien:Experiment Design for Data Science VU (Knees)/Exam 23.01.2019 - Group B

From VoWi
Jump to navigation Jump to search

Exam Group B, 23.01.20

Reproducibility
- What is PRIMAD, name the components and explain what "priming" each of them achieves.
- Which issues are there in reproducibility (from a programming perspective)?
Trust in AI/ML
- What are the benefits of the automation of decision making?
- What are the issues of the automation of decision making?
- What are the benefits of "black-box" algorithms?
- What are the drawbacks of "black-box" algorithms?
WEKA / CV , statistical significance
The data is split into train and test exactly the same way for two algorithms (classification of patents) and repeated 20 times. There was a simple WEKA workflow of loading the data, standardizing, CV, KNN , averaging results, outputting results.
- What is CV? Explain it with a figure for k=4. 
- What is leave-one-out CV? Explain the benefits/drawbacks
- What is wrong with the WEKA workflow? (standardization before CV)
- Which performance measures can be used to measure the performance of the algorithms? Which one makes most sense. Explain the measure (formula)
- Which statistical test can be used to compare the algorithms / test the significance? Explain why it can be used here.
- Which types of errors are there? What are they? How can they be reduced/prevented?