For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
struct GraphemeSliceIter[mut: Bool, //, origin: Origin[mut=mut], forward: Bool = True]
Iterator over grapheme clusters in a string, yielding each cluster as a StringSlice.
A grapheme cluster is what a user would typically think of as a single "character" on screen. This includes combining character sequences, emoji with modifiers, flag sequences, and other multi-codepoint grapheme clusters as defined by UAX #29.
The forward parameter only controls the behavior of the __next__()
method used for normal iteration. Calls to next() will always take an
element from the front of the iterator, and calls to next_back() will
always take an element from the end. Mixing next() and next_back()
on the same iterator is supported: they share the remaining byte range
but use independent state (forward iteration keeps incremental UAX #29
state; reverse iteration caches a safe restart boundary). This is safe
because forward priming only consults the codepoint at the start of the
remaining range, and next_back() shrinks the range from the end without
moving that start. A forward next() advances the front and invalidates
the cached reverse safe-start so the next reverse call recomputes it.
Note: len() is an O(n) operation that must scan all remaining bytes
to count grapheme boundaries. Avoid calling it in a loop; prefer
iterating with for g in s.graphemes() or calling next() until
None.
Note: Reverse iteration costs more per element than forward iteration.
The UAX #29 state machine is forward-scanning, so next_back() backs
up to a guaranteed grapheme boundary (typically a line break or the
start of the string) and forward-scans from there. The safe boundary
is cached across reverse calls (a forward next() invalidates the
cache), so per-call cost is dominated by forward-scan length: small
in text with frequent Control/CR/LF codepoints, growing with the
distance back to such a codepoint in long runs without them.
TODO: Vectorize the existing scalar safe-ASCII fast path. Runs of
safe-ASCII bytes (U+0020..U+007E) are already skipped one-by-one
without entering the state machine; a SIMD check (e.g. `>= 0x20 &
<= 0x7E`) could extend a run by a whole vector width per iteration.
Example:
var text = String("cafe\u{0301}") # "café" with combining accent
var count = 0
for grapheme in text.graphemes():
count += 1
# count == 4: c, a, f, e + combining acute (2 codepoints, 1 grapheme)
assert_equal(count, 4)
Parameters
- mut (
Bool): Whether the slice is mutable. - origin (
Origin[mut=mut]): The origin of the underlying string data. - forward (
Bool): The iteration direction.Falseis backwards.
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Iterable,
Iterator,
Movable,
Sized
comptime members
Element
comptime Element = StringSlice[origin]
The element type yielded by iteration.
IteratorType
comptime IteratorType[iterable_mut: Bool, //, iterable_origin: Origin[mut=iterable_mut]] = GraphemeSliceIter[origin, forward]
The iterator type.
Parameters
- iterable_mut (
Bool): Whether the iterable is mutable. - iterable_origin (
Origin[mut=iterable_mut]): The origin of the iterable.
Methods
__iter__
__iter__(ref self) -> Self
Return an iterator over grapheme clusters.
Returns:
Self: A copy of this iterator.
__next__
__next__(mut self) -> StringSlice[origin]
Get the next grapheme cluster.
If forward is set to False, this will return the next grapheme
cluster from the end of the string.
Returns:
StringSlice[origin]: The next grapheme cluster as a StringSlice.
Raises:
StopIteration if the iterator has been exhausted.
__len__
__len__(self) -> Int
Return the number of remaining grapheme clusters.
This is an O(n) operation that scans all remaining bytes to count grapheme cluster boundaries.
Returns:
Int: The number of grapheme clusters remaining.
remaining_byte_length
remaining_byte_length(self) -> Int
Returns the number of bytes not yet consumed by the iterator.
This is O(1): it reports the size of the remaining range without scanning grapheme boundaries. Combined with the original byte length of the source slice, callers can compute how many bytes the iterator has produced so far without summing per-grapheme byte lengths.
Returns:
Int: The byte length of the iterator's remaining range.
next
next(mut self) -> Optional[StringSlice[origin]]
Get the next grapheme cluster, or None if exhausted.
Returns:
Optional[StringSlice[origin]]: The next grapheme cluster as a StringSlice, or None.
peek_back
peek_back(mut self) -> Optional[StringSlice[origin]]
Return the last grapheme cluster without advancing the iterator.
Repeated calls return the same value. The first reverse call (peek_back
or next_back) does the backward scan to find a safe restart boundary
and caches it; subsequent reverse calls reuse the cache and only pay
for the forward scan from that boundary.
Returns:
Optional[StringSlice[origin]]: The last grapheme cluster as a StringSlice, or None if the
iterator is empty.
next_back
next_back(mut self) -> Optional[StringSlice[origin]]
Get the last grapheme cluster in the underlying string, or None if the iterator is empty.
This consumes one grapheme from the end of the remaining range. It
does not share state with forward iteration (next()), so the two
can be interleaved freely.
The UAX #29 state machine is inherently forward-scanning, so
next_back() backs up to a guaranteed grapheme boundary — a
Control/CR/LF codepoint or the start of the string — and then
forward-scans from that boundary. The safe boundary, once found,
is cached and reused across subsequent reverse calls (a forward
next() invalidates the cache because it moves the front pointer).
Per-call cost is therefore dominated by the forward scan length:
roughly proportional to the distance from the most recent Control/
CR/LF codepoint to the cluster being returned. For text containing
line breaks or whitespace this is small; for long runs without
Control/CR/LF the forward scan extends back toward the start of
the string and the per-call cost grows accordingly.
Returns:
Optional[StringSlice[origin]]: The last grapheme cluster as a StringSlice, or None.