IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: 1.0
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

block

GPU block-level operations and utilities.

This module provides block-level operations for NVIDIA and AMD GPUs, including:

  • Block-wide reductions:
    • sum: Compute sum across block
    • max: Find maximum value across block
    • min: Find minimum value across block
    • broadcast: Broadcast value to all threads

The module builds on warp-level operations from the warp module, extending them to work across a full thread block (potentially multiple warps). It handles both NVIDIA and AMD GPU architectures and supports various data types with SIMD vectorization.

All operations support 1D blocks via the block_size parameter, as well as 2D and 3D blocks via the block_dim_x, block_dim_y, and block_dim_z parameters. For multi-dimensional blocks, thread linearization follows the standard row-major order: linear_id = x + y * dim_x + z * dim_x * dim_y.

Functions

  • broadcast: Broadcasts a value from a source thread to all threads in a block.
  • compute_offset: Computes the offset with the padding if needed.
  • max: Computes the maximum value across all threads in a block.
  • min: Computes the minimum value across all threads in a block.
  • prefix_sum: Performs a prefix sum (scan) operation across all threads in a 1D block.
  • sum: Computes the sum of values across all threads in a block.