For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
ldg
ldg[dtype: DType, //, width: Int = 1, *, alignment: Int = align_of[SIMD[dtype, width]]()](x: UnsafePointer[Scalar[dtype]]) -> SIMD[dtype, width] where dtype.is_numeric()
Load data from global memory through the non-coherent cache.
This function provides a hardware-accelerated global memory load operation
that uses the GPU's non-coherent cache (equivalent to CUDA's __ldg instruction).
It optimizes for read-only data access patterns.
Note:
- Uses invariant loads which indicate the memory won't change during kernel execution.
- Particularly beneficial for read-only texture-like access patterns.
- May improve performance on memory-bound kernels.
Parameters:
- dtype (
DType): The data type to load (must be numeric). - width (
Int): The SIMD vector width for vectorized loads. - alignment (
Int): Memory alignment in bytes. Defaults to natural alignment of the SIMD vector dtype.
Args:
- x (
UnsafePointer[Scalar[dtype]]): Pointer to global memory location to load from.
Returns:
SIMD[dtype, width]: SIMD vector containing the loaded data.