Cache

Coalescing / contiguous access

  • A coalesced memory transaction is one in which all of the threads in a half-warp access global memory at the same time. The correct way to do it is just have consecutive threads access consecutive memory addresses.

  • GPUs batch many threads (warps/wavefronts).

  • If threads in a group load adjacent addresses, the hardware can merge requests into fewer memory transactions (coalescing).

  • Non-sequential or strided accesses increase transactions and reduce effective bandwidth.

  • Memory Coalescing Techniques .

Cache lines and alignment

  • Accesses are serviced in cache-line granularity; unaligned or small scattered loads can cause full-line fetches or multiple lines, increasing bandwidth pressure. Designing buffer layouts for aligned, contiguous reads reduces misses.

Bank conflicts (shared memory)

  • When many threads access the same bank with conflicting addresses, accesses serialize. Layout transforms (padding/transpose) can avoid conflicts.

Texture/texture caches

  • Sampled image access can use specialized caches with different locality assumptions versus raw buffer loads; memory layout (tiling) influences cache efficiency.