IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: Nightly
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

shuffle_xor

shuffle_xor[dtype: DType, simd_width: Int, //](val: SIMD[dtype, simd_width], offset: UInt32) -> SIMD[dtype, simd_width]

Exchanges values between threads in a warp using a butterfly pattern.

Performs a butterfly exchange pattern where each thread swaps values with another thread whose lane ID differs by a bitwise XOR with the given offset. This creates a butterfly communication pattern useful for parallel reductions and scans.

Parameters:

  • dtype (DType): The data type of the SIMD elements (e.g. float32, int32).
  • simd_width (Int): The number of elements in each SIMD vector.

Args:

  • val (SIMD[dtype, simd_width]): The SIMD value to be exchanged with another thread.
  • offset (UInt32): The lane offset to XOR with the current thread's lane ID to determine the exchange partner. Common values are powers of 2 for butterfly patterns.

Returns:

SIMD[dtype, simd_width]: The SIMD value from the thread at lane (current_lane XOR offset).

shuffle_xor[dtype: DType, simd_width: Int, //](mask: UInt, val: SIMD[dtype, simd_width], offset: UInt32) -> SIMD[dtype, simd_width]

Exchanges values between threads in a warp using a butterfly pattern with masking.

Performs a butterfly exchange pattern where each thread swaps values with another thread whose lane ID differs by a bitwise XOR with the given offset. The mask parameter allows controlling which threads participate in the exchange.

Example:

from std.gpu.primitives.warp import shuffle_xor

# Exchange values between even-numbered threads 4 lanes apart
mask = 0xAAAAAAAA # Even threads only
var val = SIMD[DType.float32, 16](42.0) # Example value
result = shuffle_xor(mask, val, 4.0)

Parameters:

  • dtype (DType): The data type of the SIMD elements (e.g. float32, int32).
  • simd_width (Int): The number of elements in each SIMD vector.

Args:

  • mask (UInt): A bit mask specifying which threads participate in the exchange. Only threads with their corresponding bit set in the mask will exchange values.
  • val (SIMD[dtype, simd_width]): The SIMD value to be exchanged with another thread.
  • offset (UInt32): The lane offset to XOR with the current thread's lane ID to determine the exchange partner. Common values are powers of 2 for butterfly patterns.

Returns:

SIMD[dtype, simd_width]: The SIMD value from the thread at lane (current_lane XOR offset) if both threads are enabled by the mask, otherwise the original value is preserved.