TU Wien:Advanced Information Retrieval VU (Knees)/Exam 2025-06-12
Format[Bearbeiten | Quelltext bearbeiten]
Printed, automatically graded multiple-choice test containing only true/false questions. Points are deducted for incorrect answers. There were 12 questions (24 points) on the lecture content and 16 questions (16 points) on a paper that was shared 48 hours before the exam. There were at least two groups with slightly different questions.
Exam Paper[Bearbeiten | Quelltext bearbeiten]
Ivica Kostric and Krisztian Balog. 2024. A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24). Association for Computing Machinery, New York, NY, USA, 2271–2275. https://doi.org/10.1145/3626772.3657933
Questions[Bearbeiten | Quelltext bearbeiten]
On the lecture (2 points each)[Bearbeiten | Quelltext bearbeiten]
IR evaluation[Bearbeiten | Quelltext bearbeiten]
- Datasets with sparse judgments trade reduced document coverage for increased question coverage.
- The M in MAP refers to calculating the mean over all questions evaluated.
- Mean Reciprocal Rank calculates the average over the reciprocal ranks of all relevant documents.
Tokenization and text representation[Bearbeiten | Quelltext bearbeiten]
- To reduce vocabulary sizes, BERT relies on lemmatization.
- When using word embeddings, n-grams can be modelled by projecting word representations to join representations via sliding window CNNs.
- In BERT, special tokens control the segment embeddings, which are added to the token embeddings and position embeddings.
- BERT operates on a word level. Tokens found in the input text are never split up or modified.
Retrieval augmented generation[Bearbeiten | Quelltext bearbeiten]
- Naïve RAG mainly consists of three parts: indexing, retrieval and generation.
- For optimizing LLMs through the strategy of fine-tuning, the aspect of requiring external knowledge outweighs the aspect of requiring model adaption [sic!].
- Retrieval Augmented Generation is used as a strategy to avoid hallucinations of LLMs.
- Advanced RAG proposes multiple optimization strategies around pre-retrieval and post-retrieval, following a chain-like structure.
On the provided paper (1 point each)[Bearbeiten | Quelltext bearbeiten]
- The proposed multi-query rewriting strategy utilizes the internal sequence generation and scoring process to obtain query rewrites basically for free.
- For the majority of experiments, CMQR produces significantly better results at than its single query counterparts.
- The authors claim that – to the best of their knowledge – this is the first work to address neural query rewriting.
More questions on the paper followed, including one about the presence of typos in the paper.