Version: 1.0

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

get_fragment_size

get_fragment_size[mma_shape: IndexList[3]]() -> IndexList[3]

Calculates the fragment size per thread for a given MMA shape.

For tensor core operations, each thread in a warp handles a portion of the computation. This function determines how many elements each thread needs to process for the A, B, and C/D matrices based on the MMA shape.

Parameters:

mma_shape (IndexList[3]): An IndexList[3] containing the MMA dimensions [M, N, K].

Returns:

IndexList[3]: An IndexList[3] containing the fragment sizes per thread for matrices A, B, and C/D respectively, calculated as: [M*K/WARP_SIZE, N*K/WARP_SIZE, M*N/WARP_SIZE].