TU Wien:Experiment Design for Data Science VU (Knees)
|Lecturers||Peter Knees, Andreas Rauber, Allan Hanbury, Alexander Schindler|
|Department||Information Systems Engineering|
|Master Data Science||Pflichtmodul FDS/FD - Fundamentals of Data Science - Foundations|
|Masterstudium Computational Science and Engineering||Wahlmodul Data Management|
|Master Business Informatics||Wahlmodul DA/EXT - Data Analytics Extension|
Data privacy and ethics (Hanbury), statistical testing (Knees), and productivity and data management (Rauber). Esentially, one of data science "bread and butter" courses.
Different lectureres covering different parts of the material. Lectures not every week. Attendance not mandatory (not checked). 2 practical assignments to be done during semester. The first assignment is individual and requires esentially no coding except for very basic dataset exploration and visualisation. The second assignment a group work and is far more demanding in terms of coding and time needed.
Basic knowledge about statistical testing is certainly helpful, as is basic knowledge about how to set up a machine learning experiment (but this is also briefly presented in the lectures). For reproducing the chosen paper's experiment, some programming skills come in handy. Here, some knowledge of python (scikit-learn, pandas, numpy; visulization with matplotlib, seaborn can be helpful) is necessary, maybe some R and/or Python if you prefer that (may depend on the chosen paper to be reproduced, see assignments). Weka could also be needed.
- A. Hanbury: clear and concise, on-topic, presentation of relevant thoughts and questions surrounding Data Science
- P. Knees: lectures about experiment design, forming hypotheses and statistical evaluation.
- A. Rauber: lectures about reproducibility of papers and experiments, problems with code, libraries, etc.
- Forming hypotheses for (machine learning) experiments: given a data set, explore it and describe some interesting details. Then, from the gained insights, form three different hypotheses that can be tested in machine learning experiment. Describe dependent and independent variables, and how you would conduct the respective experiments. No actual programming required. To be done individually (not group work).
- From a list of 3 short papers (around 3-4 pages), select one and reproduce the experiments and results. Check how well that is possible, if all data, description of methods, etc., is available to be able to reproduce. This is to be done in groups of 3 students. Groups are giving a short presentation on their progress during the semester, helpful for getting feedback. Deliverables are code and a report.
The exam contains 4 relatively open-ended questions, each bringing 25% of the points (see materials).
Dauer der Zeugnisausstellung
2019WS: To simplify studying for the exam I created a minimal set of slides containing just the "relevant" parts: Minimal_2019_all_blocks.pdf
- Assignment 2: Form a group of motivated people, start early, check for your favorite paper option (which one seems easy / aligned with your skills), read some extra literature around the chosen paper, if necessary.
Verbesserungsvorschläge / Kritik