# GAI Test 1 - 2024WS _Note: Answers were taken from the Tuwel test review._ _Note2: Because the test results was worse than expected the grading scheme was adjusted from the all-or-nothing:_ - _For each correct option you chose, you'll receive 1/(number of correct answers to the question) points._ - _For each incorrect option you chose, you'll lose 1/(number of incorrect answers to that question) points._ - _The minimum score for each question remains 0 points, so negative scores are not propagated._ _For example: If a question has 2 correct options and 3 incorrect ones, selecting one correct option gives you 0.5 points, while selecting an incorrect option deducts 0.33 points._ --- ## Frage 1 **What is the purpose of the "clip" function in PPO's objective function?** *J(θ):=Et[min(rt(θ)At, clip(rt(θ), 1−ϵ, 1+ϵ)At)]* - [x] a. To limit the size of policy updates by keeping probability ratios within bounds. - [ ] b. To speed up the convergence of the optimization process by keeping action distribution within bounds. - [ ] c. To reduce memory requirements during training by trimming the policy updates to a fixed scale. - [ ] d. To prevent numerical overflow during training by keeping probability ratios within bounds. - [ ] e. To trim the advantage estimates avoiding overly aggressive advantage estimations. --- ## Frage 2 **Which of these statement(s) about Knowledge Graphs (KGs) and LLMs is (are) CORRECT?** - [ ] a. KGs help bring generalizability to LLMs. - [x] b. KGs help bring domain-specific knowledge to LLMs. - [x] c. LLMs help bring language processing capabilities to KGs. - [ ] d. KGs can reduce the hardware requirements for training LLMs. - [ ] e. LLMs help avoid hallucinations in KGs. --- ## Frage 3 **What is a (are) key characteristic(s) of list refinement prompts (list digging prompts)?** - [ ] a. They remove duplicate entries from lists. - [ ] b. They only work with alphabetically ordered lists. - [ ] c. They randomly reorder existing list elements. - [ ] d. They only work if one provides a quantitative sorting function. - [x] e. They search between consecutive list elements for better matching entities. --- ## Frage 4 **Which statements about InstructGPT's training processes is (are) CORRECT?** - [ ] a. The training process aims to incorporate human feedback to improve the model's time to first token. - [x] b. The training process consists of three distinct steps: collecting demonstration data, training a reward model, and optimizing a policy using PPO. - [ ] c. Human feedback is strategically integrated in the final optimization phase to prevent early overfitting to human preferences. - [ ] d. The training process follows a four-stage pipeline: initial pretraining, supervised instruction tuning, reward model calibration, and adversarial instruction tuning. - [ ] e. The reward model is trained using a combination of human feedback and automated metrics to ensure consistent quality assessment. --- ## Frage 5 **The attention weights matrix in scaled dot-product attention has dimensions nseq × nseq where nseq is the sequence length. Which property must it satisfy mathematically?** - [ ] a. The matrix is symmetric along the diagonal. - [x] b. Each row sums up to 1. - [x] c. All values are non-negative. - [ ] d. The matrix is always invertible. - [ ] e. All diagonal entries are always equal to 1. --- ## Frage 6 **What is the purpose of the advantage function At(s, a) in PPO?** - [ ] a. To act as a threshold for early stopping if the variance in benefits of taking different actions is too small. - [x] b. To compare the value of an action against the average value of all actions in a state. - [ ] c. To measure the overall performance of the policy at a given state. - [ ] d. To calculate the learning rate for policy updates at each step avoiding overly aggressive updates. - [ ] e. It is a normalized reward compared to all previous action rewards. --- ## Frage 7 **What is a (are) key finding(s) regarding implicit user unfairness in LLM-based recommender systems according to the research presented?** - [ ] a. As LLMs can reveal previously unseen connections and generate recommendations outside the constraints of traditional recommender systems, they are less likely to lead users into a filter bubble. - [ ] b. Due to their non-deterministic nature, LLMs produce far more diverse recommendations over time than traditional recommender engines. - [ ] c. Implicit user unfairness means producing discriminatory recommendations based only on sensitive user characteristics. - [x] d. Implicit user unfairness means producing discriminatory recommendations based only on non-sensitive user characteristics. - [x] e. LLMs can infer sensitive attributes from the user's non-sensitive attributes according to their global knowledge, which arguably contributes to implicit unfairness. --- ## Frage 8 **Which statement(s) about traditional recommender systems' limitations is (are) INCORRECT?** - [ ] a. They are often limited to a single application domain. - [x] b. They excel at incorporating context-dependent explicit user needs. - [ ] c. They require user-system interaction data for training. - [ ] d. They are prone to various types of biases. - [x] e. Compared to LLM-based recommender systems, they can easily address the cold start problem. --- ## Frage 9 **What does the term "food-for-thought" prompt mean in the context of LLMs?** - [ ] a. A prompt designed to generate creative ideas unrelated to the task. - [x] b. A multi-stage process leading to several pairs of questions and answers designed to help the LLM improve its reasoning and understanding. - [ ] c. A prompt that tests the LLM's ability to generate multiple diverse outputs for a single question. - [ ] d. The use of a variety of independent instructions to force the model to provide multiple diverse responses for the initial query. - [ ] e. The act of providing multiple examples to avoid the propagation of statistical biases in a model's answer. --- ## Frage 10 **Which of the following statements is (are) CORRECT about masked attention in decoders?** - [ ] a. It enables learning bidirectional dependencies by masking out individual tokens. - [x] b. A triangular mask zeros out the attention weights of future tokens. - [x] d. The act of masking ensures that each token is generated based only on the tokens that precede it in the sequence. - [ ] c. It uses different embedding dimensions than non-masked self-attention. - [ ] e. Similar to dropouts, it zeros individual parts of embeddings to prevent overfitting to specific dimension-dependent relationships between tokens. --- ## Frage 11 **Which statement(s) about positional encoding in transformers is (are) INCORRECT?** - [ ] a. Positional encodings help to overcome the inherent permutation invariance of the attention mechanism. - [ ] b. Positional encodings enrich embedding vectors with information about the position of a token within its sequence. - [ ] c. The dimension of the positional encoding vectors scales with the dimension of the embedding layer. - [ ] d. It can use sinusoidal functions to encode position information. - [x] e. Positional encodings have to be learned during the training process. --- ## Frage 12 **What defines a token in the context of LLMs?** - [ ] a. The concept of tokens does not have a specific meaning in the context of LLMs. - [ ] b. A higher dimensional numerical identifier for the whole sequence. - [x] c. The atomic unit of language that a model processes. This can be a word, part of a word, or a character. - [ ] d. The dense vector representation of the atomic unit of language processed by a transformer. - [ ] e. The smallest semantic unit of a language, which is always a single character. --- ## Frage 13 **What makes the transformer architecture well-suited to sequential data modelling?** - [ ] a. Other than previous architectures, transformers can generalize across previously unseen languages without further pretraining. - [ ] b. Their attention mechanism makes them highly interpretable and transparent, unlike previous "black box" architectures like LSTMs. - [ ] c. They need little to no training data to learn the highly contextual semantics of sequences such as language. - [x] d. They can learn patterns from multiple parts/subsequences of a sequence simultaneously, rather than processing it sequentially. - [ ] e. Transformers are specifically designed to handle fixed-length sequences, enhancing their performance with sequential data. --- ## Frage 14 **Google's KELM Approach is:** - [ ] a. a strategy of dynamically querying a knowledge graph during inference. - [ ] b. a strategy of replacing traditional knowledge graph storage with language model embeddings. - [x] c. a strategy of using KGs to improve LLMs. - [ ] d. a synonym for Google's PageRank algorithm. - [ ] e. a strategy of using LLMs to improve KGs. --- ## Frage 15 **What is the idea behind scaling the dot product QKᵀ in the attention mechanism?** *Attention(Q,K,V) = softmax(QKᵀ/√dₖ)V* - [ ] a. To allow for bidirectional attention flow between tokens by normalizing the distance between sequence positions. - [ ] b. To reduce the model's memory footprint during training and inference. - [ ] c. To dynamically adjust the attention mechanism's output based on its input sequence length, allowing more evenly spread attention scores. - [x] d. To prevent attention scores from having extreme values in the softmax, which could lead to vanishing gradients. - [x] e. To normalize attention scores by counteracting the potential "explosion" of the dot products in high dimensions, ensuring a smoother softmax distribution. --- ## Frage 16 **Which (What) statement(s) about the presented approach "Tuning LLMs via KG Reasoning" are CORRECT?** - [ ] a. The approach requires continuous access to the underlying knowledge graph during inference to maintain reasoning capabilities. - [x] b. It describes a way of using more than the original facts stored in the KG. - [x] c. The system incorporates a domain glossary to ensure accurate verbalization. - [x] d. Incorporates an LLM generating tokenized Q&A pairs from the verbalized templates. - [ ] e. It describes a way of incorporating graph structures into the attention layers. --- ## Frage 17 **In terms of combining Knowledge Graphs (KGs) and LLMs, it is possible to:** - [x] a. use both LLMs and KGs synergetically. - [x] b. use LLMs to improve KGs. - [x] c. use KGs to support the reasoning capabilities of LLMs. - [x] d. use KGs to improve LLMs. - [x] e. use KGs to support the explainability of LLMs. --- ## Frage 18 **What is a (are) some common reason(s) why LLMs may hallucinate or make mistakes?** - [x] a. An overgeneralization of statistical patterns in the training data. - [ ] b. Insufficient computational resources during inference require the model to cut corners. - [ ] c. Syntax errors in the inference code. - [x] d. Adoption of incorrect information within the training data. - [x] e. Insufficient representation of the task to solve in the training data/training corpora. --- ## Frage 19 **In multi-head attention with h heads and model dimension dₘₒdₑₗ, which statement(s) is (are) INCORRECT?** - [x] a. The embeddings are separated across the sequence dimension of the input. Different embeddings of the sequence are fed into different heads. - [ ] b. The embeddings are separated across the model dimension dₘₒdₑₗ of the input. Different parts of each embedding are fed into different heads. - [ ] c. Each head can learn different attention patterns. - [ ] d. The final output dimension of each embedding after concatenation and projection matches dₘₒdₑₗ. - [ ] e. The total number of learnable parameters scales linearly with h. --- ## Frage 20 **What is (are) NOT a characteristic(s) of the P5 framework?** - [x] a. It operates through using multiple unified sequence-to-sequence frameworks. - [ ] b. It integrates user-item information with personalized prompts. - [x] c. It combines multiple tasks into a single prompt-based natural language task. - [ ] d. It leverages personalized prompts for recommendations. - [ ] e. It learns multiple recommendation tasks. ---