ENS Name Normalization

raffy · May 11, 2022, 10:07am

Any thoughts on 2-in-1 characters (ꜳæꜵꜷꜹꜻꜽʤʣʥᴔꭁꭂʩǁʪɮʫʨꝷʦʧꜩɱᵯ) being confusable? eg. aa vs ꜳ. aa shouldn’t be confusable because it’s double ASCII.

Combining Marks (CM) modify how a character is presented:

å = 61 30A (where 30A is a CM)
å = E5

NFC is responsible for collapsing these together, eg. they both normalize to E5. For some characters, there is no combined glyph, eg. e̊ = 65 30A has no corresponding single character form.

Multiple CM can be attached to the same character, eg. ã̰ = 61 303 330. NFC is responsible for putting the CM in a canonical order.

You can stack CM on characters, eg. ã̃̃̃̃̃̃̃̃̃ and a̰̰̰̰̰̰̰̰̰̰.

Some CM stack without any visual indication, eg. a̸ (1x) vs a̸̸̸̸̸̸̸̸̸̸ (10x).
a̸̸.eth ≠ a̸̸̸.eth ≠ a̸̸̸̸.eth = ...

There’s currently 500 registered names with CM. We could disallow some of the malicious ones (underscore-like or very small, etc.)? We could disallow stacking?