Back to home

Tag

shared memory

Shared memory is a key CUDA performance lever on GPUs, shaping how warps exchange data, avoid bank conflicts, and overlap HBM latency with compute through features like cp.async and pipelined loading.

2 articles