<aside> ❗ Headings and text with “▶” can be toggled to expand contained info. You can also see the Table of Contents as a collapsible sidebar on the right →

</aside>

Table of Contents

Introduction

This module provides guidance to perform CUDA programming and to use performance counters.

Objectives

Be able to program and optimize programs.

Readings

Required

Optional

Notes / Things that stand out

Applications Suitable for GPUs

GPUs are good for tasks:

Involving massive parallel data to take advantage of massive number of threads.
Applications that have low dominance in host device communication costs are good for running on GPUs.
Furthermore, applications that have coalesced data accesses in the global memory accesses.

Profiling

Enables us to identify performance bottlenecks by identifying hotspots in our applications.
It provides key performance metrics such as:
- Reported throughput,
- Number of divergent branches,
- Number of divergent memory, both coalesced and uncoalesced,
- Occupancy,
- Memory bandwidth utilization.
To achieve this, use profilers provided by GPU vendors to analyze and optimize our kernel functions.
Profiling is also useful before you convert your applications for GPUs to identify the key hotspots in CPUs.