Thread
Everybody wants their models to run faster. However, researchers often cargo cult performance without a solid understanding on the underlying principles.
To address that, I wrote a post called "Making Deep Learning Go Brrrr From First Principles". (1/3)
horace.io/brrr_intro.html
To address that, I wrote a post called "Making Deep Learning Go Brrrr From First Principles". (1/3)
horace.io/brrr_intro.html
For single-GPU performance, there are 3 main areas your model might be bottlenecked by. Those are: 1. Compute, 2. Memory-Bandwidth, and 3. Overhead. Correspondingly, the optimizations that matter *also* depend on which regime you're in. (2/3)
Mentions
See All
Andrej Karpathy @karpathy
ยท
Mar 15, 2022
Excellent and unintuitive read on GPUs. The chip doing the compute has tiny amount of memory & is connected to the main memory literally through a straw. Most of the energy goes to data movement too. Many repercussions. E.g. latency better predicted by # activations than # flops