Prerequisites
To successfully implement the kernels in Mini-YAIE, you should be familiar with:
Programming Languages
- Python (Intermediate): Understanding of classes, inheritance, type hinting, and PyTorch tensors.
- C++ (Basic): For reading and writing the CUDA kernels (though much of the boilerplate is provided).
- CUDA (Basic): Understanding of the GPU execution model (blocks, threads, shared memory).
Machine Learning Concepts
- Transformer Architecture: Queries, Keys, Values, Attention mechanism.
- Tensors: Shapes, dimensions, matrix multiplication.
Tools
- Git: For version control.
- Linux/Unix Shell: For running commands.