Version: Nightly

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

ldg

ldg[dtype: DType, //, width: Int = 1, *, alignment: Int = align_of[SIMD[dtype, width]]()](x: UnsafePointer[Scalar[dtype]]) -> SIMD[dtype, width] where dtype.is_numeric()

Load data from global memory through the non-coherent cache.

This function provides a hardware-accelerated global memory load operation that uses the GPU's non-coherent cache (equivalent to CUDA's __ldg instruction). It optimizes for read-only data access patterns.

Note:

Uses invariant loads which indicate the memory won't change during kernel execution.
Particularly beneficial for read-only texture-like access patterns.
May improve performance on memory-bound kernels.

Parameters:

dtype (DType): The data type to load (must be numeric).
width (Int): The SIMD vector width for vectorized loads.
alignment (Int): Memory alignment in bytes. Defaults to natural alignment of the SIMD vector dtype.

Args:

x (UnsafePointer[Scalar[dtype]]): Pointer to global memory location to load from.

Returns:

SIMD[dtype, width]: SIMD vector containing the loaded data.