<aside>
❗ Headings and text with “▶” can be toggled to expand contained info.
You can also see the Table of Contents as a collapsible sidebar on the right →
</aside>
Table of Contents
Introduction
This module provides guidance to perform CUDA programming and to use performance counters.
Objectives
- Be able to program and optimize programs.
Readings
Required
Optional
Notes / Things that stand out
Applications Suitable for GPUs
GPUs are good for tasks:
- Involving massive parallel data to take advantage of massive number of threads.
- Applications that have low dominance in host device communication costs are good for running on GPUs.
- Furthermore, applications that have coalesced data accesses in the global memory accesses.
Profiling
- Enables us to identify performance bottlenecks by identifying hotspots in our applications.
- It provides key performance metrics such as:
- Reported throughput,
- Number of divergent branches,
- Number of divergent memory, both coalesced and uncoalesced,
- Occupancy,
- Memory bandwidth utilization.
- To achieve this, use profilers provided by GPU vendors to analyze and optimize our kernel functions.
- Profiling is also useful before you convert your applications for GPUs to identify the key
hotspots in CPUs.