# GAI Test 1 - 2024WS

_Note: Answers were taken from the Tuwel test review._

_Note2: Because the test results was worse than expected the grading scheme was adjusted from the all-or-nothing:_


 - _For each correct option you chose, you'll receive 1/(number of correct answers to the question) points._
 - _For each incorrect option you chose, you'll lose 1/(number of incorrect answers to that question) points._
 - _The minimum score for each question remains 0 points, so negative scores are not propagated._

_For example: If a question has 2 correct options and 3 incorrect ones, selecting one correct option gives you 0.5 points, while selecting an incorrect option deducts 0.33 points._

---

## Frage 1
**What is the purpose of the "clip" function in PPO's objective function?**  
*J(θ):=Et[min(rt(θ)At, clip(rt(θ), 1−ϵ, 1+ϵ)At)]*

- [x] a. To limit the size of policy updates by keeping probability ratios within bounds.  
- [ ] b. To speed up the convergence of the optimization process by keeping action distribution within bounds.  
- [ ] c. To reduce memory requirements during training by trimming the policy updates to a fixed scale.  
- [ ] d. To prevent numerical overflow during training by keeping probability ratios within bounds.  
- [ ] e. To trim the advantage estimates avoiding overly aggressive advantage estimations.  

---

## Frage 2
**Which of these statement(s) about Knowledge Graphs (KGs) and LLMs is (are) CORRECT?**

- [ ] a. KGs help bring generalizability to LLMs.  
- [x] b. KGs help bring domain-specific knowledge to LLMs.  
- [x] c. LLMs help bring language processing capabilities to KGs.  
- [ ] d. KGs can reduce the hardware requirements for training LLMs.  
- [ ] e. LLMs help avoid hallucinations in KGs.  

---

## Frage 3
**What is a (are) key characteristic(s) of list refinement prompts (list digging prompts)?**

- [ ] a. They remove duplicate entries from lists.  
- [ ] b. They only work with alphabetically ordered lists.  
- [ ] c. They randomly reorder existing list elements.  
- [ ] d. They only work if one provides a quantitative sorting function.  
- [x] e. They search between consecutive list elements for better matching entities.  

---

## Frage 4
**Which statements about InstructGPT's training processes is (are) CORRECT?**

- [ ] a. The training process aims to incorporate human feedback to improve the model's time to first token.  
- [x] b. The training process consists of three distinct steps: collecting demonstration data, training a reward model, and optimizing a policy using PPO.  
- [ ] c. Human feedback is strategically integrated in the final optimization phase to prevent early overfitting to human preferences.  
- [ ] d. The training process follows a four-stage pipeline: initial pretraining, supervised instruction tuning, reward model calibration, and adversarial instruction tuning.  
- [ ] e. The reward model is trained using a combination of human feedback and automated metrics to ensure consistent quality assessment.  

---

## Frage 5
**The attention weights matrix in scaled dot-product attention has dimensions nseq × nseq where nseq is the sequence length. Which property must it satisfy mathematically?**

- [ ] a. The matrix is symmetric along the diagonal.  
- [x] b. Each row sums up to 1.  
- [x] c. All values are non-negative.  
- [ ] d. The matrix is always invertible.  
- [ ] e. All diagonal entries are always equal to 1.  

---

## Frage 6
**What is the purpose of the advantage function At(s, a) in PPO?**

- [ ] a. To act as a threshold for early stopping if the variance in benefits of taking different actions is too small.  
- [x] b. To compare the value of an action against the average value of all actions in a state.  
- [ ] c. To measure the overall performance of the policy at a given state.  
- [ ] d. To calculate the learning rate for policy updates at each step avoiding overly aggressive updates.  
- [ ] e. It is a normalized reward compared to all previous action rewards.  

---

## Frage 7
**What is a (are) key finding(s) regarding implicit user unfairness in LLM-based recommender systems according to the research presented?**

- [ ] a. As LLMs can reveal previously unseen connections and generate recommendations outside the constraints of traditional recommender systems, they are less likely to lead users into a filter bubble.  
- [ ] b. Due to their non-deterministic nature, LLMs produce far more diverse recommendations over time than traditional recommender engines.  
- [ ] c. Implicit user unfairness means producing discriminatory recommendations based only on sensitive user characteristics.  
- [x] d. Implicit user unfairness means producing discriminatory recommendations based only on non-sensitive user characteristics.  
- [x] e. LLMs can infer sensitive attributes from the user's non-sensitive attributes according to their global knowledge, which arguably contributes to implicit unfairness.  

---

## Frage 8
**Which statement(s) about traditional recommender systems' limitations is (are) INCORRECT?**

- [ ] a. They are often limited to a single application domain.  
- [x] b. They excel at incorporating context-dependent explicit user needs.  
- [ ] c. They require user-system interaction data for training.  
- [ ] d. They are prone to various types of biases.  
- [x] e. Compared to LLM-based recommender systems, they can easily address the cold start problem.  

---

## Frage 9
**What does the term "food-for-thought" prompt mean in the context of LLMs?**

- [ ] a. A prompt designed to generate creative ideas unrelated to the task.  
- [x] b. A multi-stage process leading to several pairs of questions and answers designed to help the LLM improve its reasoning and understanding.  
- [ ] c. A prompt that tests the LLM's ability to generate multiple diverse outputs for a single question.  
- [ ] d. The use of a variety of independent instructions to force the model to provide multiple diverse responses for the initial query.  
- [ ] e. The act of providing multiple examples to avoid the propagation of statistical biases in a model's answer.  

---

## Frage 10
**Which of the following statements is (are) CORRECT about masked attention in decoders?**

- [ ] a. It enables learning bidirectional dependencies by masking out individual tokens.  
- [x] b. A triangular mask zeros out the attention weights of future tokens.  
- [x] d. The act of masking ensures that each token is generated based only on the tokens that precede it in the sequence.  
- [ ] c. It uses different embedding dimensions than non-masked self-attention.  
- [ ] e. Similar to dropouts, it zeros individual parts of embeddings to prevent overfitting to specific dimension-dependent relationships between tokens.  

---

## Frage 11
**Which statement(s) about positional encoding in transformers is (are) INCORRECT?**

- [ ] a. Positional encodings help to overcome the inherent permutation invariance of the attention mechanism.  
- [ ] b. Positional encodings enrich embedding vectors with information about the position of a token within its sequence.  
- [ ] c. The dimension of the positional encoding vectors scales with the dimension of the embedding layer.  
- [ ] d. It can use sinusoidal functions to encode position information.  
- [x] e. Positional encodings have to be learned during the training process.  

---

## Frage 12
**What defines a token in the context of LLMs?**

- [ ] a. The concept of tokens does not have a specific meaning in the context of LLMs.  
- [ ] b. A higher dimensional numerical identifier for the whole sequence.  
- [x] c. The atomic unit of language that a model processes. This can be a word, part of a word, or a character.  
- [ ] d. The dense vector representation of the atomic unit of language processed by a transformer.  
- [ ] e. The smallest semantic unit of a language, which is always a single character.  

---

## Frage 13
**What makes the transformer architecture well-suited to sequential data modelling?**

- [ ] a. Other than previous architectures, transformers can generalize across previously unseen languages without further pretraining.  
- [ ] b. Their attention mechanism makes them highly interpretable and transparent, unlike previous "black box" architectures like LSTMs.  
- [ ] c. They need little to no training data to learn the highly contextual semantics of sequences such as language.  
- [x] d. They can learn patterns from multiple parts/subsequences of a sequence simultaneously, rather than processing it sequentially.  
- [ ] e. Transformers are specifically designed to handle fixed-length sequences, enhancing their performance with sequential data.  

---

## Frage 14
**Google's KELM Approach is:**

- [ ] a. a strategy of dynamically querying a knowledge graph during inference.  
- [ ] b. a strategy of replacing traditional knowledge graph storage with language model embeddings.  
- [x] c. a strategy of using KGs to improve LLMs.  
- [ ] d. a synonym for Google's PageRank algorithm.  
- [ ] e. a strategy of using LLMs to improve KGs.  

---

## Frage 15
**What is the idea behind scaling the dot product QKᵀ in the attention mechanism?**  
*Attention(Q,K,V) = softmax(QKᵀ/√dₖ)V*

- [ ] a. To allow for bidirectional attention flow between tokens by normalizing the distance between sequence positions.  
- [ ] b. To reduce the model's memory footprint during training and inference.  
- [ ] c. To dynamically adjust the attention mechanism's output based on its input sequence length, allowing more evenly spread attention scores.  
- [x] d. To prevent attention scores from having extreme values in the softmax, which could lead to vanishing gradients.  
- [x] e. To normalize attention scores by counteracting the potential "explosion" of the dot products in high dimensions, ensuring a smoother softmax distribution.  

---

## Frage 16
**Which (What) statement(s) about the presented approach "Tuning LLMs via KG Reasoning" are CORRECT?**

- [ ] a. The approach requires continuous access to the underlying knowledge graph during inference to maintain reasoning capabilities.  
- [x] b. It describes a way of using more than the original facts stored in the KG.  
- [x] c. The system incorporates a domain glossary to ensure accurate verbalization.  
- [x] d. Incorporates an LLM generating tokenized Q&A pairs from the verbalized templates.  
- [ ] e. It describes a way of incorporating graph structures into the attention layers.  

---

## Frage 17
**In terms of combining Knowledge Graphs (KGs) and LLMs, it is possible to:**

- [x] a. use both LLMs and KGs synergetically.  
- [x] b. use LLMs to improve KGs.  
- [x] c. use KGs to support the reasoning capabilities of LLMs.  
- [x] d. use KGs to improve LLMs.  
- [x] e. use KGs to support the explainability of LLMs.  

---

## Frage 18
**What is a (are) some common reason(s) why LLMs may hallucinate or make mistakes?**

- [x] a. An overgeneralization of statistical patterns in the training data.  
- [ ] b. Insufficient computational resources during inference require the model to cut corners.  
- [ ] c. Syntax errors in the inference code.  
- [x] d. Adoption of incorrect information within the training data.  
- [x] e. Insufficient representation of the task to solve in the training data/training corpora.  

---

## Frage 19
**In multi-head attention with h heads and model dimension dₘₒdₑₗ, which statement(s) is (are) INCORRECT?**

- [x] a. The embeddings are separated across the sequence dimension of the input. Different embeddings of the sequence are fed into different heads.  
- [ ] b. The embeddings are separated across the model dimension dₘₒdₑₗ of the input. Different parts of each embedding are fed into different heads.  
- [ ] c. Each head can learn different attention patterns.  
- [ ] d. The final output dimension of each embedding after concatenation and projection matches dₘₒdₑₗ.  
- [ ] e. The total number of learnable parameters scales linearly with h.  

---

## Frage 20
**What is (are) NOT a characteristic(s) of the P5 framework?**

- [x] a. It operates through using multiple unified sequence-to-sequence frameworks.  
- [ ] b. It integrates user-item information with personalized prompts.  
- [x] c. It combines multiple tasks into a single prompt-based natural language task.  
- [ ] d. It leverages personalized prompts for recommendations.  
- [ ] e. It learns multiple recommendation tasks.  

---