a) Consider the following Confusion Matrix obtained from comparing the outputs of a machine learning classifier with the ground truth: Classified positive Classified negative Actual positive 611 89 Actual negative 194 106 Calculate Accuracy, Sensitivity, Specificity, Precision, and Recall. Write the equation for each and fill in the numbers. Notes: Presenting a final, real number is not necessary, a fraction is sufficient. you can use LaTeX Math notation in your answers, but it should not even be necessary. b) Now consider the following Cost Matrix, showing the costs associated with making certain errors. Error Cost Classified positive Classified negative Actual positive 0 12 Actual negative 2 0 Given these costs, would you optimize the machine learning classifier towards Precision or Recall? Explain your answer! ------------------ Reproducibility a) List and describe common sources of irreproducibility from a computing perspective b) Describe the PRIMAD model of reproducibility types and explain the insights gained by priming (modifying) the respective elements ----------------- Below is one statement from the Modelers' Hippocratic Oath (Derman, 2012) and two rules for responsible big data research (Zook et al., 2017). For EACH of the three statements/rules, answer the following questions: (i) Explain the statement/rule and state what aspect of Data Science it is meant to warn about. (ii) Give one concrete example of a situation in Data Science in which this statement/rule is applicable. (iii) Explain which measures should be taken by Data Scientists to satisfy the statement/rule. Make sure that you number your answers clearly as A(i),(ii),(iii), B(i),(ii),(iii), C(i),(ii),(iii). A. I will remember that I didn't make the world, and it doesn't satisfy my equations. B. Recognize that privacy is more than a binary value. C. Design your data and systems for auditability. --------------------------------------------------- Consider an experimental setup in which you want to compare two machine learning algorithms A (SVM) and B (kNN) for cancer detection. I.e., for an instance you want to predict whether the instance is benign or malignant (malignant is the target class). In total, the data set consists of 20.000 instances. Cross validation using k=20 folds is used for performance estimation. Explain the concept of cross validation and how it differs from other evaluation setups (briefly compare to at least two other strategies). What are advantages and disadvantages? To test if one classifier is statistically significantly better than the other wrt. the chosen evaluation measure, which test would be applicable here and why? (If multiple tests are applicable, pick the one with highest power.) How many sampled values are underlying your test, ie. which value does N have? Which types of errors can you make in statistical hypothesis testing? Give a brief definition of each. How can you minimize the chances of making these errors?