For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
SharedToGenericTileCopier
struct SharedToGenericTileCopier[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], *, swizzle: Optional[Swizzle] = None, num_threads: Int = thread_layout.size()]
A TileCopier that moves a tile from shared memory into generic memory.
The swizzle parameter is a property of the shared-memory tile being
read and must match the swizzle used when that tile was written;
passing a mismatched (or None) swizzle produces incorrect data.
Parameters
- thread_layout (
Layout[thread_layout.shape_types, thread_layout.stride_types]): Layout describing how threads are organized over the copy. - swizzle (
Optional[Swizzle]): Swizzle the shared-memory tile was populated with. - num_threads (
Int): Total number of threads in the thread block. Threads beyondthread_layout.size()do not participate.
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
TileCopier
comptime members
dst_address_space
comptime dst_address_space = AddressSpace.GENERIC
Destination AddressSpace this copier writes to.
src_address_space
comptime src_address_space = AddressSpace.SHARED
Source AddressSpace this copier reads from.
Methods
copy
copy[element_size: Int](self, dst: TileTensor[linear_idx_type=dst.linear_idx_type, element_size=element_size], src: TileTensor[address_space=SharedToGenericTileCopier[thread_layout, swizzle=swizzle, num_threads=num_threads].src_address_space, linear_idx_type=src.linear_idx_type, element_size=element_size])
Copies src in shared memory into dst in generic memory.
The non-swizzled path uses TileTensor.copy, which widens to SIMD
stores when the layouts permit. The swizzled path walks per-thread
elements explicitly and applies the swizzle to the source fragment
offsets.
Masked bounds checking, fp32 -> half precision downcast, and
binary_op fusion are not supported.
Parameters:
- element_size (
Int): Number of scalar elements per logical element.
Args:
- dst (
TileTensor[linear_idx_type=dst.linear_idx_type, element_size=element_size]): Destination tile in generic memory. - src (
TileTensor[address_space=SharedToGenericTileCopier[thread_layout, swizzle=swizzle, num_threads=num_threads].src_address_space, linear_idx_type=src.linear_idx_type, element_size=element_size]): Source tile in shared memory.