Latency
-
Is how many cycles it takes for a result of one instruction to be available for the next.
Throughput
-
Is how often the instruction can issue.
Cold data
-
Not in cache.
Tips
-
You should always count cache lines, not bytes, for bandwidth computations.
-
Out Of Order execution hides latency and pipelines everything, but we can help it by disentangling dependency chains.
-
RAM is fast, but far from GPU. We can help it by laying out data for hardware prefetcher.