ENS Name Normalization

raffy · January 19, 2022, 1:24am

UTS-39 and Chromium documentation discuss some good stuff, but they’re both even more restrictive than IDNA 2008.

UTS-39 references a tool which displays confusables. For example, x vs х is crazy dangerous. It shows up in a 14 registered names so far: some surrounded by the appropriate script and a few that are certainly malicious. Applying a confusable-like mapping might be a good idea but it will brick some names. Some of these are already handled by IDNA 2008.

To reduce implementation complexity, rather then relying on many public Unicode files, ENS could supply a singular table that combines UTS-51 (+SEQ/ZWJ), UTS-46, and UTS-39 which makes implementation relatively straight forward.

Edit: Here is a first attempt at a visualization of the confusables relative to IDNA 2008.

The way to read this is: “o” is a confusable category. There are 75 characters that are confusable with it. According to IDNA 2008, they correspond to 42 separate entities after normalization is applied. The largest group has 14 characters that map to o. The next largest group has 7 characters that map to ه, etc. Groups of one are just shown as a single element (without the count and black arrow) to save space. The color codes match the rest of the reports: green = valid, purple = mapped, red = disallowed.

The scary thing would be any groups that map to similar yet distinct characters. For the image above, 14 map to o (6F) and 5 map to ο (3BF). There’s very little difference between “Latin Letter O” and “Greek Omicron”.