TU Wien:High Performance Computing VU (Träff)/HPC Exam July 2025 summary, True/False

Aus VoWi
Zur Navigation springen Zur Suche springen

HPC EXAM 17.06.2025 Block 01. Matrix vector → full complexity from algorithm 3 (fully blocked) → complexity of reduce scatter for algorithm 1 (row) → reduce scatter needed for algorithm 2 Block 02. → hypercube with d dimension has a diameter of d = log_2(p) → diameter of ring is equal to p Block 03. → d-dimension butterfly, allgather algorithm couldnt get better than doing a reduce-scatter + allgather Block 04. → if we can find the bisection width with BRS and DRS in polynomial time → Block 05. → In a fully connected network with diameter 1, an MPI_Alltoall can be done in O(1) communication rounds. Block 06. (?) MPI code with reduce scatter block and two processes. Input[i]=myrank so [0,0] for rank 0 and [1,1] for rank 1 And a Scan afterwards → for p=2, if the output of the reduce scatter was 1 for both process → if output for reduce scatter was always the same as the scan → if reduce scatter could be replaced by scan MPI_reduce for any given root can be emulated by a single MPI_reduce_scatter(_block?) call MPI_Allgather can be emulated by a MPI_reduce_scatter_block(?) call without additional copies or communications Block 07. collectives MPI_Allgather and MPI_Bcast exist in the same blocking and non-blocking libraries of MPI (True) collectives MPI_xxx would results the other processes to call the same derived data types.(False) I do not remember the used collective in this question Block 08. 1- On a fully connected network with 1-ported bidirectional communication the alltoall operation can be executed in p-1 communication rounds 2- On a fully connected network with 1-ported bidirectional communication the alltoall operation can be executed in (ceil) log_2(p) communication rounds Block 09. 1- HPL uses LU factorization with partial pivot to solve matrix system . 2-HPL uses block partitioning of the matrix with (possibly) noncontiguous matrix entries 3-HPL Benchmark measures MIPS (Million Instructions per Second) Other questions, dont know which block: -If network is sufficiently strong broadcast possible in M-1 + log(p) communication rounds -Distributing n blocks of data in a ring network needs n(p-1) communication rounds - reduce_scatter receive a displ[] argument (dont remember the complete question) - reduce_scatter can replace a reduce [..if done on all processes or something?] (not sure about this one)