For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
ds_read_tr8_b64
ds_read_tr8_b64[dtype: DType, //](shared_ptr: UnsafePointer[Scalar[dtype], address_space=AddressSpace.SHARED]) -> SIMD[dtype, 8]
Reads a 64-bit LDS transpose block using TR8 layout and returns SIMD[dtype, 8] of 8-bit types.
Each 16-lane row reads 16x8 bytes from LDS and performs two interleaved 8x8 byte transposes, producing 8 transposed bytes per lane.
Notes:
- Only supported on AMD GPUs (CDNA4+).
- Maps directly to llvm.amdgcn.ds.read.tr8.b64 intrinsic.
- Return type must use v2i32 intermediate to avoid LLVM type legalizer crash.
Parameters:
- dtype (
DType): Data type of the elements (must be 8-bit type).
Args:
- shared_ptr (
UnsafePointer[Scalar[dtype], address_space=AddressSpace.SHARED]): Pointer to the LDS transpose block.
Returns:
SIMD[dtype, 8]: SIMD[dtype, 8] of 8-bit types.