ENS Name Normalization

raffy · January 24, 2022, 3:08am

In ens-normalize.js, ZWJ are only allowed inside whitelisted UTS-51 emoji sequences and inside ContextJ which permit it between or following a few characters.

Technically, UTS-51 allows arbitrary-length sequences of ZWJ, but many of those do not correspond to a single glyph and indistinguishable, which is why a whitelist is required.

For example, 💩‍💩.eth = 1F4A9 200D 1F4A9 fits your rules and is valid UTS-51 but invalid in ens-normalize.js

Another example would be 😵‍💫😵‍💫😵‍💫.eth vs 😵💫😵💫😵💫.eth which can only be distinguished by ZWJ placement.

I would consider the ZWJ inside emoji solved, although I’m open to ZWJ sequence recommendations (Unsupported Set from Emojipedia), eg. Microsoft supports a much wider set of family permutations.

The 3 ContextJ allowances are here but I never received any input whether it was worth the extra complexity.

I believe most libraries that are currently in-use allow both ZWJs anywhere, as they are permitted in IDNA 2008 (typically under the assumption that ContextJ is active). They’re actually disallowed in IDNA 2003 (they are deviations) but were permitted to enable complex emoji.

The two lists I’m using are: