Chunk string into equal substrings with unicode support
Credits to stackoverflow.com/a/29202760/2727317
$ yarn add @shelf/fast-chunk-string
import fastChunkString from '@shelf/fast-chunk-string';
// the fastest way
fastChunkString('unicorns', {size: 2, unicodeAware: false});
// => ['un', 'ic', 'or', 'ns']
// ignore unicode, still fast but inaccurate
fastChunkString('😀😃😄😁', {size: 2, unicodeAware: false});
// => ['😀', '😃', '😄', '😁']
// respect unicode, slow but accurate
fastChunkString('😀😃😄😁', {size: 2, unicodeAware: true});
// => ['😀😃', '😄😁']Run via yarn benchmark. Measured on M2 Max.
Running "Without Unicode" suite...
Progress: 100%
~33 kb split by 2 kb:
14 106 903 ops/s, ±1.71% | 86.19% slower
~33 kb split by 1 mb:
100 461 043 ops/s, ±1.45% | 1.63% slower
~330 kb split by 2 kb:
1 600 485 ops/s, ±0.63% | 98.43% slower
~330 kb split by 1 mb:
102 125 168 ops/s, ±1.50% | fastest
~3.3 mb split by 2 kb:
161 507 ops/s, ±1.19% | 99.84% slower
~3.3 mb split by 1 mb:
41 773 807 ops/s, ±1.54% | 59.1% slower
~33 mb split by 2 kb:
11 098 ops/s, ±0.25% | slowest, 99.99% slower
~33 mb split by 1 mb:
5 506 349 ops/s, ±0.58% | 94.61% slower
Finished 8 cases!
Fastest: ~330 kb split by 1 mb
Slowest: ~33 mb split by 2 kb
Running "Unicode Aware" suite...
Progress: 100%
~33 kb split by 2 kb with unicodeAware:
847 ops/s, ±0.99% | 12.14% slower
~33 kb split by 1 mb with unicodeAware:
964 ops/s, ±0.25% | fastest
~330 kb split by 2 kb with unicodeAware:
71 ops/s, ±0.76% | slowest, 92.63% slower
~330 kb split by 1 mb with unicodeAware:
90 ops/s, ±0.94% | 90.66% slower
Finished 4 cases!
Fastest: ~33 kb split by 1 mb with unicodeAware
Slowest: ~330 kb split by 2 kb with unicodeAware
September 2025 improvements were delivered autonomously by the gpt-5-codex model. We treated the hot paths like any latency-sensitive service and tuned the slowest sections:
- Single-pass unicode chunking – length and slicing now come from the same
runes()walk, eliminating the extrastring-lengthscan and keeping multicodepoint graphemes intact. - Consolidated ASCII loop – collapsed the fast path into one traversal with early exits for empty inputs and oversized chunk sizes to trim per-call overhead.
- Fractional-size parity – restored the legacy
slicecoercion semantics so non-integer chunk sizes behave exactly as before, backed by new regression tests.
The result is steadier throughput in the ASCII suite (for example ~33 kb split by 1 mb climbs from 85.6M to 100.5M ops/s) and a 9–10× lift in the unicode-aware scenarios (e.g. 33 kb splits rise from ~101 ops/s to ~964 ops/s) while preserving behaviour for combining marks and emoji ligatures.
$ git checkout master
$ yarn version
$ yarn publish
$ git push origin master --tagsMIT © Shelf