Triton 1.0: GPU programming language for neural networks

OpenAI introduced Triton 1.0 , an open-source graphics processor programming language. Triton is similar to Python and allows users who have no experience with CUDA to write highly efficient code.

Triton, first presented in 2019 at the International Seminar on Machine Learning and Programming Languages, simplifies the development of specialized functions-cores that can be much faster than in general-purpose libraries. The Triton compiler simplifies the code, automatically optimizes and parallelizes it, converting it into code for execution on the latest Nvidia GPUs (AMD CPUs and GPUs, as well as platforms other than Linux, are not currently supported).

A key feature of Triton is the ability to quickly write a year, when performing which the maximum performance of video cards is achieved. For example, it can be used to write kernel functions that multiply FP16 matrices with a performance corresponding to the cuBLAS library function and a size of less than 25 lines. According to OpenAI, Triton allows you to create kernel functions that are 2 times higher in performance than similar Torch implementations.

Of the existing domain languages and JIT compilers, Triton is most similar to Numba: core functions are defined as Python decorators and run simultaneously with different identifiers on the instance grid. The first stable version of Triton, along with the documentation, is available in the project repository on GitHub.