<aside>
❗ Headings and text with “▶” can be toggled to expand contained info.
You can also see the Table of Contents as a collapsible sidebar on the right →
</aside>
<aside>
🚨 Quizzes 6, 7, and 8 are significantly more difficult than the first 5 quizzes.
The quizzes make the assumption that you have understood the lectures and thoroughly read the required papers.
</aside>
Table of Contents
Introduction
- This module covers advanced topics in GPU programming, including handling divergent branches, optimizing GPU registers, and understanding shared memory and register files.
- Students will also develop the skills needed to effectively read and analyze GPU architecture research papers.
Objectives
- Describe the performance issues related to divergent branches and the basic mechanisms for handling them.
- Describe the opportunities for GPU register optimization.
- Explain the differences between shared memory and register files.
- Develop the ability to read GPU architecture papers
Readings
Required
Optional
Notes / Things that stand out
Lesson 1 - Handling Divergent Branches
Divergent Branches
- Warp:
- A group of threads that are executed together.
- So far, we have assumed at all threads in a warp will execute the same program, following SPMD model.
- But, within a warp, not all threads are necessarily executed.
- Some threads might work on different parts of the program.
- E.g. If-Else statements.