Skip to content

Grapheme-cluster aware cursor, hit-testing and soft wrap on top of #5922#5988

Open
xyos wants to merge 18 commits into
ajaxorg:masterfrom
xyos:varchar3-graphemes
Open

Grapheme-cluster aware cursor, hit-testing and soft wrap on top of #5922#5988
xyos wants to merge 18 commits into
ajaxorg:masterfrom
xyos:varchar3-graphemes

Conversation

@xyos

@xyos xyos commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Builds on #5922 (initial range.getClientRects()-based text measurement). This adds the two commits on top of that branch making the cursor/column model and soft wrap grapheme-cluster aware, and fixing related rendering issues found while testing against a real browser.

Changes on top of #5922

Grapheme-cluster cursor model & pixel-accurate coordinates (50aff8d)

  • New lang.getGraphemeCluster() / lang.getGraphemeBoundaries() helpers using Intl.Segmenter (grapheme granularity), with a surrogate-pair fallback when unavailable.
  • Selection.moveCursorTo snaps to grapheme cluster boundaries instead of only surrogate pairs, so arrow keys traverse ZWJ emoji (👨‍👩‍👧‍👦) and combining marks (ã̤) in one step.
  • VirtualRenderer.textToScreenCoordinates uses fontMetrics.textWidth() instead of column × characterWidth, so forward and inverse coordinate mappings agree.
  • FontMetrics.$pixelToColumn hit-testing iterates real grapheme clusters, treats span seams inclusively (fixes clicks with showInvisibles falling between spans), and clamps clicks left of line start to column 0.
  • U+200D removed from the invalid-invisible regex so ZWJ emoji render intact (fixes Emojis consisting of surrogate pairs are displayed incorrectly #5813).
  • Line-start RLE (U+202B) renders as plain text instead of a red invalid dot when rtl/rtlText is enabled (fixes ace.js: placing a red dot when clicking "enter" #5423, same approach as Fix: make RTL new line a valid character #5434); Trojan-Source highlighting still applies outside RTL mode.

Grapheme-aware soft wrap (5d2bc1d)

  • $getDisplayTokens marks continuation code units of grapheme clusters with a new CHAR_CONT display token (only for lines containing chars ≥ U+0300).
  • $computeWrapSplits never splits inside a cluster: a split landing mid-cluster moves back to the cluster start, or past the cluster when it is wider than the wrap limit. Before this, a wrap limit crossing 👨‍👩‍👧‍👦 split its 11 code units across screen rows.

Issues this addresses

#460, #4142, #5431, #5813, #4602, #5423, #5436, #3753, #3866, #3617

Testing

  • experiments/widthchar.html — a browser harness with monospace/proportional/RTL/wrapped editors over pathological lines (BMP & ZWJ emoji, combining marks, CJK, tabs, Hebrew, U+FE0F, APL fallback glyphs). It verifies cursor DOM position against Range ground truth, textToScreen→screenToText round-trips at every grapheme boundary, arrow-walk stops, selection-marker coverage, wrap-split boundaries, and the RTL red-dot/backspace scenarios: 2,400+ assertions passing in Chrome, 0 failures (baseline before these commits: ~280 failures).
  • New unit test wrapLine split never breaks grapheme clusters in edit_session_test.js; full suite unchanged otherwise (1,480 passing).

Open kitchen-sink @ 5d2bc1d6012061d91e50b3bf433c8ea27376b99a

Testing Page

Open kitchen-sink @ 423a67e091c51d742baaa72bd1d0a2aca284f72f

mkslanc and others added 17 commits May 5, 2026 18:07
… calculations

# Conflicts:
#	src/virtual_renderer.js
… rendering

- add getGraphemeCluster/getGraphemeBoundaries helpers (Intl.Segmenter) to lang
- snap cursor to grapheme cluster boundaries in moveCursorTo (ZWJ, combining marks)
- use fontMetrics.textWidth in textToScreenCoordinates instead of column * charWidth
- iterate grapheme clusters in $pixelToColumn hit-testing; treat span seams inclusively;
  clamp clicks left of line start to column 0
- stop rendering U+200D as invalid invisible so ZWJ emoji stay intact (ajaxorg#5813)
- render line-start RLE as plain text when rtl/rtlText enabled instead of red dot (ajaxorg#5423)
- add experiments/widthchar.html browser test harness for variable-width rendering
  covering ajaxorg#460 ajaxorg#4142 ajaxorg#5431 ajaxorg#5813 ajaxorg#4602 ajaxorg#5436 ajaxorg#5423 ajaxorg#3753 ajaxorg#3866 ajaxorg#3617
- $getDisplayTokens marks continuation code units of grapheme clusters
  (surrogate pairs, combining marks, ZWJ sequences) with CHAR_CONT so
  wrap-split candidates can see cluster boundaries
- $computeWrapSplits/addSplit moves any split falling inside a cluster
  back to the cluster start, or past it when the cluster is wider than
  the wrap limit, so soft wrap never tears a grapheme apart
- unit tests for ZWJ/combining-mark/surrogate wrap splits
- widthchar harness: wrapped editor with emoji/CJK/hebrew lines checking
  splits land on grapheme boundaries, caret round-trips, and arrow walks
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

One of the public type files has been updated, plase make sure there are no backwards incompatible changes done in the PR.

@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.50153% with 85 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.93%. Comparing base (c89c0d5) to head (423a67e).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/font_metrics.js 93.88% 32 Missing ⚠️
src/test/mockdom.js 93.54% 18 Missing ⚠️
src/layer/text.js 61.29% 12 Missing ⚠️
src/edit_session.js 83.33% 10 Missing ⚠️
src/lib/lang.js 88.57% 8 Missing ⚠️
src/selection.js 55.55% 4 Missing ⚠️
src/layer/marker.js 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5988      +/-   ##
==========================================
+ Coverage   93.73%   93.93%   +0.19%     
==========================================
  Files         640      641       +1     
  Lines      137743   138724     +981     
  Branches    14487    14757     +270     
==========================================
+ Hits       129113   130306    +1193     
+ Misses       8630     8418     -212     
Flag Coverage Δ
unittests 93.93% <93.50%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Screen columns now count grapheme clusters instead of UTF-16 code units,
so a ZWJ emoji occupies one logical column instead of eleven:

- lang.mayContainGraphemeClusters: cheap regexp gate (marks, ZWJ,
  variation selectors, astral code points, conjoining jamo) so plain
  ASCII/CJK/hebrew lines skip segmentation entirely
- $getStringScreenWidth walks grapheme boundaries; doc<->screen mapping
  (documentToScreenPosition/screenToDocumentPosition) follows since both
  funnel through it
- $getDisplayTokens emits one display token per cluster and carries a
  parallel docLengths array; $computeWrapSplits consumes it so wrap
  budget is measured in visual columns and doc split offsets stay in
  code units; CHAR_CONT hack removed
- text layer $renderToken returns grapheme column counts and computes
  tab stops from grapheme columns
- font_metrics $findColumnPosition/$pixelToColumn map screen columns
  to text-node offsets via grapheme boundaries
- widthchar harness: graphemeColumns check; updated wrap/surrogate test
  expectations to the new column semantics
@xyos

xyos commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Pushed a third commit (423a67e) completing the column model rework: 1 grapheme cluster = 1 screen column.

Screen columns now count grapheme clusters instead of UTF-16 code units, addressing the "column must be column" model discussed in #5431:

  • $getStringScreenWidth walks grapheme boundaries (both documentToScreenPosition and screenToDocumentPosition funnel through it), gated by a cheap regexp so plain text skips segmentation.
  • $getDisplayTokens emits one display token per cluster with a parallel docLengths array; $computeWrapSplits consumes it, so wrap budget is now measured in visual columns — a line of ZWJ emoji no longer wraps 11× too early — while wrap split offsets remain code-unit doc columns.
  • Text layer and font-metrics DOM walkers translate between grapheme screen columns and code-unit text-node offsets.

Verified in Chrome via the experiments/widthchar.html harness (2,800+ assertions incl. new graphemeColumns and vertical-navigation desired-column checks, 0 failures) and the node suite (1,480 passing; the 2 failures are pre-existing on master-adjacent branches).

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

One of the public type files has been updated, plase make sure there are no backwards incompatible changes done in the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emojis consisting of surrogate pairs are displayed incorrectly ace.js: placing a red dot when clicking "enter"

3 participants