TU Wien:Advanced Methods for Regression and Classification VU (Filzmoser)

Aus VoWi
Zur Navigation springen Zur Suche springen

Daten[Bearbeiten | Quelltext bearbeiten]

Vortragende Peter Filzmoser
ECTS 4,5
Letzte Abhaltung 2023W
Sprache English
Mattermost advanced-methods-for-regression-and-classificationRegisterMattermost-Infos
Links tiss:105707
Zuordnungen
Masterstudium Data Science Modul MLS/FD - Machine Learning and Statistics - Foundations (Pflichtfach)
Masterstudium Computational Science and Engineering Modul Untergruppe Data Management and Analytics
Masterstudium Logic and Computation Modul Knowledge Representation and Artificial Intelligence (Gebundenes Wahlfach)


Inhalt[Bearbeiten | Quelltext bearbeiten]

This lecture is basically the english (new?) version of "Klassifikation und Diskriminanzanalyse".

Theory of models and methods for regression and classification and how to use them in R.

  • Linear Regression
  • Model comparison and model selection: Information criteria, etc.
  • Linear Methods: OLS (ordinary least squares), PCR, PLS, Shrinkage methods (Ridge, Lasso Regression)
  • Linear Methods for classification: Linear and Quadratic Discriminant Analysis (LDA, QDA), Logistic Regression
  • Non-linear Methods for regression: Basis expansions (splines), Generalized Additive Models (GAM)
  • Tree-based methods for regression and classification, random forests
  • Support Vector Machines (SVM) for regression and classification (not much theory here, more applications)

Ablauf[Bearbeiten | Quelltext bearbeiten]

Lectures in English. Presentation of Material from course notes + sketches on the blackboard. Material covers theoretic stuff, empirical methods, and realization / implementation in R (relevant packages, functions, code examples). Exercises to solved using R.

Benötigte/Empfehlenswerte Vorkenntnisse[Bearbeiten | Quelltext bearbeiten]

Some knowledge of regression is definitely nice to have, but regression is explored in-depth in all kinds of variants. In general: linear Algebra, matrix & vector calculus, basic probability theory for following the lecture notes. A certain level in R is definitely required (however, imho, R can be picked up quite fast, if you have sufficient programming skills in some language, say, python)

Vortrag[Bearbeiten | Quelltext bearbeiten]

Explanations to the lecture notes that are used as slides / presented with projector. Sometimes extra material, more in-depth explanations with sketches on the blackboard. As ususal, (imho) a good mix of first theory and models, than more hands-on stuff on how to use the methods in R, with some concrete data.

Attending the lectures is a good idea, as the methods for the exercises are usually discussed, as well as some of the R methods needed. In the exercise classes, students' solutions are shown and discussed, which can be really helpful to learn from these solutions and/or mistakes. Students can volunteer to show their solutions.

Übungen[Bearbeiten | Quelltext bearbeiten]

WS 2018: 11 excercises, to be programmed in / solved with R. Topics range from regression models to classification

WS2019: Exercises are no longer mandatory, the exercises are done in class together with Prof. Filzmoser. The exercises are also not a part of the grade, only the oral exam

WS2020: 10 out of 12 exercises have to be handed in and have to have a >50% of points.

WS2021: 9 exercises in total, minimum requirements: 7 exercises need to be graded with > 0 %, arithmetic mean of all exercises has to be >= 50%

Prüfung, Benotung[Bearbeiten | Quelltext bearbeiten]

[WS 18/19] Oral exam, usually 2-3 people together. You get a sheet of paper, 2-3 questions. Everybody gets a question; for on participant, questions start right away, the other(s) have a little time to prepare.

You should write down the 1-2 most important forumulas, such as the cost function / criterium to be minimized, and possibly how the solution looks like. Be able to explain the quantities, variables, in the formulas. You should be able to explain how to arrive at the solution. Sketches of what is going on also usually helpful.

[Please add your experience, if different!]

It has to be emphasized that not being able to write down the relevant formulas will result in a significantly worse grade. For every model, the optimization problem, side conditions (if applicable), solution and other relevant formulas should be known, e.g. the degrees of freedom for smoothing splines.

[WS 22/23] oral exam was organized in 30 minutes slots with 4-5 students each. All sitting in the seminar room receiving a blank page and then Prof. Filzmoser gave everyone a generic topic (e.g. what are splines, Random Forest, SVM, Classification Tree, GAM) and then you could prepare for like 5-10 minutes for this topic and then the professor went through and let you explain the topic and asked questions in between. All questions that we received in my exam slot were mentioned in belows topic/question catalogue. Grading seems to be fair, however you should be able to understand the concepts and explain them to the professor (including side questions). Formulas are good to write down. In my slot 1 colleague failed the exam, but he had not a lot of time to prepare for the exam and had bad luck with the question. If you come prepared, the exam shouldn't be a problem.

Dauer der Zeugnisausstellung[Bearbeiten | Quelltext bearbeiten]

  • 19.12.2019: same day
  • 27.02.2020: same day
  • 15.02.2023: same day

Zeitaufwand[Bearbeiten | Quelltext bearbeiten]

A few hours for the (almost) weekly exercise assignments. The amount will depend on your previous knowledge of the theoretic material, usage of R and whether you attend the lecture or not (which is imho very helpful!)

WS 2019: During the semester, only the time for the lectures (since exercises were not mandatory). Lecture attendance is recommended, as Prof. Filzmoser explains the ideas behind the methods informally as well to make it easier to understand. IMO this is especially helpful for the nonlinear part of the lecture (basis expansions, GAM, trees, SVM). For the exam, all the relevant formulas should be known (see Materialien). Learning all this might take a week or two, depending on your affinity for linear algebra and amount of work per day.

Unterlagen[Bearbeiten | Quelltext bearbeiten]

Lecture notes in german and english available. Skriptum in Deutsch und Englisch wird zur Verfügung gestellt.

Recommended books:

The Elements of Statistical Learning (Hastie, Tibshirani, Friedman)

An Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani)

Tipps[Bearbeiten | Quelltext bearbeiten]

noch offen

Highlights / Lob[Bearbeiten | Quelltext bearbeiten]

noch offen

Verbesserungsvorschläge / Kritik[Bearbeiten | Quelltext bearbeiten]

noch offen