IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: 1.0
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

create_tensor_tile_im2col

create_tensor_tile_im2col[dtype: DType, tile_shape: IndexList[2], swizzle_mode: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_NONE, *, __tile_shape: IndexList[2] = tile_shape, __desc_shape: IndexList[2] = _im2col_desc_shape[dtype, tile_shape, swizzle_mode]()](ctx: DeviceContext, tensor: LayoutTensor[dtype, address_space=tensor.address_space, element_layout=tensor.element_layout, layout_int_type=tensor.layout_int_type, linear_idx_type=tensor.linear_idx_type, masked=tensor.masked, alignment=tensor.alignment], lower_corner_h: Int, lower_corner_w: Int, upper_corner_h: Int, upper_corner_w: Int, out_height: Int, out_width: Int, filter_h: Int, filter_w: Int) -> TMATensorTileIm2col[dtype, 2, __tile_shape, __desc_shape]

Creates a TMA tensor tile with im2col transformation for 2D convolution.

This factory function creates a TMA descriptor that performs hardware im2col transformation during loads. The descriptor encodes the convolution geometry and the TMA hardware computes addresses on-the-fly.

For im2col TMA, each transaction loads one output pixel with multiple channels. This follows CUTLASS's approach where:

  • pixels_per_column = 1 (one pixel per TMA transaction)
  • channels_per_pixel = min(K_tile, swizzle_width) (contiguous channels)

Note: For stride=1, dilation=1 convolution with padding (following CUTLASS convention):

  • lower_corner_h = -pad_h
  • lower_corner_w = -pad_w
  • upper_corner_h = pad_h - (filter_h - 1)
  • upper_corner_w = pad_w - (filter_w - 1)

The filter offsets passed to the PTX instruction range from 0 to (filter_size - 1) and are added to lower_corner to compute actual input coordinates.

Parameters:

  • dtype (DType): The data type of tensor elements.
  • tile_shape (IndexList[2]): Shape [M_tile, K_tile] for the GEMM tile.
    • M_tile: Number of output pixels (batch * H_out * W_out slice).
    • K_tile: Number of channels (C_in * R * S slice for filter).
  • swizzle_mode (TensorMapSwizzle): Memory swizzling pattern.
  • __tile_shape (IndexList[2]): Internal parameter for the tile shape.
  • __desc_shape (IndexList[2]): Internal parameter for the descriptor shape.

Args:

Returns:

TMATensorTileIm2col[dtype, 2, __tile_shape, __desc_shape]: A TMATensorTileIm2col configured for im2col loads.

Raises:

Error if TMA descriptor creation fails.

create_tensor_tile_im2col[dtype: DType, tile_shape: IndexList[2], swizzle_mode: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_NONE, *, __tile_shape: IndexList[2] = tile_shape, __desc_shape: IndexList[2] = _im2col_desc_shape[dtype, tile_shape, swizzle_mode]()](ctx: DeviceContext, tensor: TileTensor[dtype, address_space=tensor.address_space, linear_idx_type=tensor.linear_idx_type, element_size=tensor.element_size], lower_corner_h: Int, lower_corner_w: Int, upper_corner_h: Int, upper_corner_w: Int, out_height: Int, out_width: Int, filter_h: Int, filter_w: Int) -> TMATensorTileIm2col[dtype, 2, __tile_shape, __desc_shape]

Creates a TMA tensor tile with im2col transformation for 2D convolution.

TileTensor overload — delegates to the shared _build_im2col_descriptor helper. See the LayoutTensor overload for full background.

Parameters:

  • dtype (DType): The data type of tensor elements.
  • tile_shape (IndexList[2]): Shape [M_tile, K_tile] for the GEMM tile.
  • swizzle_mode (TensorMapSwizzle): Memory swizzling pattern.
  • __tile_shape (IndexList[2]): Internal parameter for the tile shape.
  • __desc_shape (IndexList[2]): Internal parameter for the descriptor shape.

Args:

Returns:

TMATensorTileIm2col[dtype, 2, __tile_shape, __desc_shape]: A TMATensorTileIm2col configured for im2col loads.

Raises:

Error if TMA descriptor creation fails.