Any thoughts on 2-in-1 characters (ꜳæꜵꜷꜹꜻꜽʤʣʥᴔꭁꭂʩǁʪɮʫʨꝷʦʧꜩɱᵯ
) being confusable? eg. aa
vs ꜳ
. aa
shouldn’t be confusable because it’s double ASCII.
Combining Marks (CM) modify how a character is presented:
-
å = 61 30A
(where30A is a CM
) å = E5
NFC is responsible for collapsing these together, eg. they both normalize to E5
. For some characters, there is no combined glyph, eg. e̊ = 65 30A
has no corresponding single character form.
Multiple CM can be attached to the same character, eg. ã̰ = 61 303 330
. NFC is responsible for putting the CM in a canonical order.
You can stack CM on characters, eg. ã̃̃̃̃̃̃̃̃̃
and a̰̰̰̰̰̰̰̰̰̰
.
Some CM stack without any visual indication, eg. a̸
(1x) vs a̸̸̸̸̸̸̸̸̸̸
(10x).
a̸̸.eth ≠ a̸̸̸.eth ≠ a̸̸̸̸.eth = ...
There’s currently 500 registered names with CM. We could disallow some of the malicious ones (underscore-like or very small, etc.)? We could disallow stacking?