IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: Nightly
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

ds_read_tr8_b64

ds_read_tr8_b64[dtype: DType, //](shared_ptr: UnsafePointer[Scalar[dtype], address_space=AddressSpace.SHARED]) -> SIMD[dtype, 8]

Reads a 64-bit LDS transpose block using TR8 layout and returns SIMD[dtype, 8] of 8-bit types.

Each 16-lane row reads 16x8 bytes from LDS and performs two interleaved 8x8 byte transposes, producing 8 transposed bytes per lane.

Notes:

  • Only supported on AMD GPUs (CDNA4+).
  • Maps directly to llvm.amdgcn.ds.read.tr8.b64 intrinsic.
  • Return type must use v2i32 intermediate to avoid LLVM type legalizer crash.

Parameters:

  • dtype (DType): Data type of the elements (must be 8-bit type).

Args:

Returns:

SIMD[dtype, 8]: SIMD[dtype, 8] of 8-bit types.