TU Wien:Datenorientierte Programmierparadigmen VU (Hanbury)
|Lecturers||Sebastian Böck, Allan Hanbury, Florina Mihaela Piroi, Elmar Kiesling|
|Department||Information Systems Engineering|
|Links||tiss:188995 , Mattermost-Channel|
The first part of the course is DOPP in a pure sense. The second and third part are exercises in data science, once on a given topic and once on a selected topic where all the steps of the data science process have to be done (ask a question, get the data...)
Some lectures, 3 assignments to be solved.
Some programming knowledge, especially python, is certainly necessary. Data visualization and some statistical knowledge regarding basic testing, correlation, regression models. Little bit about how to handle data (e.g. pandas) and basics about pre-processing, cleaning data. (These things are also explained in the lectures, to some extent).
A lecture on each topic
- Data Science process (Hanbury)
- Python intro (Böck)
- Numpy intro (Böck)
- Pandas intro (Piroi)
- Machine Learning intro/scikit-learn (Hanbury)
- Network Analysis (Hanbury)
Note that the lectures are only an introduction to the topics. The tasks in the exercises are more complicated than the presented examples.
- 1st assignment (individual): compare diffrent programming paradigms (OOP, Data-Oriented). To be done in using jupyter-notebook on infrastructure of TU (on the web).
- 2nd assignment (individual): loading data, pre-processing (filling missign values, outlier detection and removal), aggregating and merging data, analyzing data (correlations, etc.), building a model for prediction. Also in jupyter, as above.
- 3rd assignment (groups of 4): larger project; select from list of topics, collect, organize, process your own data to answer the given problem. Written report and final presentation.
- 1st assignment (individual): Read, manipulate and plot some data without using Pandas (We were allowed to use xlrd). Not very difficult, but I don't know what the point of this exercise was since Pandas is an absolute staple in many data science related tasks, is heavily used in Exercise 2 and we didn't have to do anything that could not also be done with less effort with Pandas. Also some hidden tests that checked for things that were only vaguely mentioned in the task description
- 2nd & 3rd assignment as above
There is no exam, only the 3 assignments. Points earned for these determine your grade. Grading of the exercises takes a couple of weeks.
Dauer der Zeugnisausstellung
WS19; Exercise 3 presentations 27.01.3019; total points 24.02.2020; certificate a few days after that
If you have no prior experience with Pandas or Machine Learning your effort will be a lot higher than 3 ECTS.
- Final presentation: there is a strict 10 minutes time limit that is strictly enforced. Make sure that you can fit your presentation into this time slot, less is more here.
- Start early with the final project (exercise 3), it can be quite some work.
- In WS19 there were a lot of organizational issues with the exercises. Non working jupyter hub, and multiple errors in the exercise data. Also, check the descriptions carefully for exercise 1 in order not to lose points in the hidden tests (e.g. a function that is never used in the visible tests - in WS19 it was __len__ of an object). So, start early with the exercises, and if you notice an issue write it immediately to tuwel with sufficient proof why there is an issue.
Verbesserungsvorschläge / Kritik