IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: 1.0
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

struct GraphemeSliceIter[mut: Bool, //, origin: Origin[mut=mut], forward: Bool = True]

Iterator over grapheme clusters in a string, yielding each cluster as a StringSlice.

A grapheme cluster is what a user would typically think of as a single "character" on screen. This includes combining character sequences, emoji with modifiers, flag sequences, and other multi-codepoint grapheme clusters as defined by UAX #29.

The forward parameter only controls the behavior of the __next__() method used for normal iteration. Calls to next() will always take an element from the front of the iterator, and calls to next_back() will always take an element from the end. Mixing next() and next_back() on the same iterator is supported: they share the remaining byte range but use independent state (forward iteration keeps incremental UAX #29 state; reverse iteration caches a safe restart boundary). This is safe because forward priming only consults the codepoint at the start of the remaining range, and next_back() shrinks the range from the end without moving that start. A forward next() advances the front and invalidates the cached reverse safe-start so the next reverse call recomputes it.

Note: len() is an O(n) operation that must scan all remaining bytes to count grapheme boundaries. Avoid calling it in a loop; prefer iterating with for g in s.graphemes() or calling next() until None.

Note: Reverse iteration costs more per element than forward iteration. The UAX #29 state machine is forward-scanning, so next_back() backs up to a guaranteed grapheme boundary (typically a line break or the start of the string) and forward-scans from there. The safe boundary is cached across reverse calls (a forward next() invalidates the cache), so per-call cost is dominated by forward-scan length: small in text with frequent Control/CR/LF codepoints, growing with the distance back to such a codepoint in long runs without them.

TODO: Vectorize the existing scalar safe-ASCII fast path. Runs of

safe-ASCII bytes (U+0020..U+007E) are already skipped one-by-one

without entering the state machine; a SIMD check (e.g. `>= 0x20 &

<= 0x7E`) could extend a run by a whole vector width per iteration.

Example:

var text = String("cafe\u{0301}") # "café" with combining accent
var count = 0
for grapheme in text.graphemes():
count += 1
# count == 4: c, a, f, e + combining acute (2 codepoints, 1 grapheme)
assert_equal(count, 4)

Parameters

  • mut (Bool): Whether the slice is mutable.
  • origin (Origin[mut=mut]): The origin of the underlying string data.
  • forward (Bool): The iteration direction. False is backwards.

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Iterable, Iterator, Movable, Sized

comptime members

Element

comptime Element = StringSlice[origin]

The element type yielded by iteration.

IteratorType

comptime IteratorType[iterable_mut: Bool, //, iterable_origin: Origin[mut=iterable_mut]] = GraphemeSliceIter[origin, forward]

The iterator type.

Parameters

Methods

__iter__

__iter__(ref self) -> Self

Return an iterator over grapheme clusters.

Returns:

Self: A copy of this iterator.

__next__

__next__(mut self) -> StringSlice[origin]

Get the next grapheme cluster.

If forward is set to False, this will return the next grapheme cluster from the end of the string.

Returns:

StringSlice[origin]: The next grapheme cluster as a StringSlice.

Raises:

StopIteration if the iterator has been exhausted.

__len__

__len__(self) -> Int

Return the number of remaining grapheme clusters.

This is an O(n) operation that scans all remaining bytes to count grapheme cluster boundaries.

Returns:

Int: The number of grapheme clusters remaining.

remaining_byte_length

remaining_byte_length(self) -> Int

Returns the number of bytes not yet consumed by the iterator.

This is O(1): it reports the size of the remaining range without scanning grapheme boundaries. Combined with the original byte length of the source slice, callers can compute how many bytes the iterator has produced so far without summing per-grapheme byte lengths.

Returns:

Int: The byte length of the iterator's remaining range.

next

next(mut self) -> Optional[StringSlice[origin]]

Get the next grapheme cluster, or None if exhausted.

Returns:

Optional[StringSlice[origin]]: The next grapheme cluster as a StringSlice, or None.

peek_back

peek_back(mut self) -> Optional[StringSlice[origin]]

Return the last grapheme cluster without advancing the iterator.

Repeated calls return the same value. The first reverse call (peek_back or next_back) does the backward scan to find a safe restart boundary and caches it; subsequent reverse calls reuse the cache and only pay for the forward scan from that boundary.

Returns:

Optional[StringSlice[origin]]: The last grapheme cluster as a StringSlice, or None if the iterator is empty.

next_back

next_back(mut self) -> Optional[StringSlice[origin]]

Get the last grapheme cluster in the underlying string, or None if the iterator is empty.

This consumes one grapheme from the end of the remaining range. It does not share state with forward iteration (next()), so the two can be interleaved freely.

The UAX #29 state machine is inherently forward-scanning, so next_back() backs up to a guaranteed grapheme boundary — a Control/CR/LF codepoint or the start of the string — and then forward-scans from that boundary. The safe boundary, once found, is cached and reused across subsequent reverse calls (a forward next() invalidates the cache because it moves the front pointer). Per-call cost is therefore dominated by the forward scan length: roughly proportional to the distance from the most recent Control/ CR/LF codepoint to the cluster being returned. For text containing line breaks or whitespace this is small; for long runs without Control/CR/LF the forward scan extends back toward the start of the string and the per-call cost grows accordingly.

Returns:

Optional[StringSlice[origin]]: The last grapheme cluster as a StringSlice, or None.