TU Wien:Grundlagen des Information Retrieval VU (Rauber)/Prüfung 2021-01-14
The online exam on the 14.01.2021 in English[Bearbeiten | Quelltext bearbeiten]
The duration of this exam was 1 hour. It was split into 3 parts, with the first one being limited to 15 minutes. The other two were unlocked 15 minutes after the start of the exam and could have been done in any order.
Part 1 (Multiple Choice)[Bearbeiten | Quelltext bearbeiten]
- PageRank is:
- calculated using Random Walks following web links
- the most important component of modern web search engines
- calculated using Slater Determinants
- based on an algorithm developed for analysing citations of scientific articles
- Which of these features can be extracted directly from symbolic audio representations such as MIDI:
- chords
- melodic contour
- pitch histograms
- timbre
- Which of the following statements about Document Surrogates are true?
- It is considered bad practice to highlight query terms in Document Surrogates
- A Document Surrogate has no effect on the perceived relevance of a document
- A Document Surrogate contains information on why the document was retrieved
- Document Surrogates are no longer used in most search engines
- In order to perform music genre classification
- we cannot use features such as SSDs or MFCCs
- we need to make sure we use a lossless audio representation as input for our analysis
- we may encounter music genres that are not defined and thus impossible to detect by acoustic information
- we need to consider features that capture timbre and rhythmic aspects of music
- trec_eval is a utility for
- calculating the similarity between two files
- calculating evaluation metrics for information retrieval
- designing search user interfaces
- indexing a small collection of documents
- Which of the following aspects are taken into account by the BM25 model?
- Term Synonyms
- Inverse Document Frequency
- Term Frequency
- Document Length
- Which of these statements about audio features is true:
- Spectral flux measures the dynamics / changes in consecutive time frames of the spectrum
- The spectral roll-off measures how much energy is present in the lower frequencies
- Zero crossing rate is a good separator for speech vs. music
- The spectral centroid is computed in the time domain
- In an index, the dictionary data structure can be used to store:
- document frequency
- BM25 values
- term vocabulary
- pointers to postings lists
- Query Logs can be used for:
- query spelling correction
- training Learning to Rank algorithms
- cooling servers
- optimising search engine cache replacement policies
- Which techniques can be used to pre-process documents before indexing?
- lemmatization
- trunking
- stop word addition
- tokenisation
- Classic Boolean Retrieval supports:
- exact match between query and document
- operators such as AND, OR and NOT
- PageRank
- ranking
- Which of the following are present in an Information Retrieval Test Collection?
- Queries
- Relevance Judgements
- Document Collection
- Search Engines
Part 2 (Evaluation Metrics)[Bearbeiten | Quelltext bearbeiten]
A query result like this one was given:
Rank | Relevancy Score |
---|---|
1 | 2 |
2 | 0 |
3 | 3 |
4 | 1 |
5 | 0 |
6 | 0 |
The scores range from 0 for non relevant, to 3 for very relevant. For boolean metrics assume numbers 1-3 denote relevant documents, and 0 non-relevant ones. Also, there are 5 relevant documents for this query in total. Calculate the following metrics:
Precision@1, Precision@2,..., Precision@6, Recall@1, Recall@6, F-measure@6, Average Precision, R precision, Cumulative Gain, Discounted Cumulative Gain, Normalised Discounted Cumulative Gain.
Part 3 (Search Engine Design Guidelines)[Bearbeiten | Quelltext bearbeiten]
There was a search engine given which the student should take a look a, and answer the following questions:
- Name and explain two design guidelines which the website fulfills.
- Name and explain two design guidelines which the website doesn't fulfill.
- How could the two broken guidelines in question two be fixed?