IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: 1.0
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

block_reduce

block_reduce[BLOCK_SIZE: Int, reduce_fn: def[dtype: DType, width: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width], dtype: DType, simd_width: Int](val: SIMD[dtype, simd_width], init: Scalar[dtype]) -> Scalar[dtype]

Performs a block-level reduction of a single SIMD value across all threads in a GPU thread block using warp-level primitives and shared memory.

Parameters:

  • BLOCK_SIZE (Int): The number of threads per block.
  • reduce_fn (def[dtype: DType, width: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width]): The binary reduction function.
  • dtype (DType): The data type of the elements.
  • simd_width (Int): The SIMD vector width.

Args:

Returns:

Scalar[dtype]: The reduced scalar result (valid on thread 0).

block_reduce[BLOCK_SIZE: Int, num_reductions: Int, reduce_fn: def[dtype: DType, width: Int, reduction_idx: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width], dtype: DType, simd_width: Int](val: StaticTuple[SIMD[dtype, simd_width], num_reductions], init: StaticTuple[Scalar[dtype], num_reductions]) -> StaticTuple[Scalar[dtype], num_reductions]

Performs a block-level reduction of multiple fused SIMD values across all threads in a GPU thread block using warp shuffles and shared memory.

Parameters:

  • BLOCK_SIZE (Int): The number of threads per block.
  • num_reductions (Int): The number of fused reductions to perform.
  • reduce_fn (def[dtype: DType, width: Int, reduction_idx: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width]): The binary reduction function, parameterized by reduction index.
  • dtype (DType): The data type of the elements.
  • simd_width (Int): The SIMD vector width.

Args:

Returns:

StaticTuple[Scalar[dtype], num_reductions]: The reduced scalar results (valid on thread 0).