ENS Name Normalization

@royalfork in a previous thread talked about the issues with emoji. He points to UTS-51 as the spec for emoji presentation.

Using just UTS-46 leaves us in this weird scenario because the correct application of UTS-46 + IDNA2008 kills almost all emoji. Emoji only got partially through because IDNA2003 was used. Only some gTLDs allow emoji via punycode but they violate UTS-46.

I think we should respect UTS-51 so we can take full advantage of current and future emoji. This would enable many more emoji, like country flags (RGI_Emoji_Tag_Sequence), a bunch of missing complex sequences (RGI_Emoji_ZWJ_Sequence), and many standard emoji that are disallowed by IDNA without emoji styling.

I believe it’s also possible to do this without breaking the existing non-standard names by grandfathering those emojis sequences All you need is a finite list of sequences that had FE0F removed (as of today) that are disallowed without it. Everything else going forward could be handled correctly. (The ideal solution would be migrating them, but that’s complex and $$$, so ignore that.)

For example, this needs fixed: 👩‍🦱.eth vs 👩‍‍🦱.eth as you can’t even tell there’s ZWJ in there. This is different from ZWJ as exterior padding.

Edit: this renders different on Mac vs PC.
EmojiDoubleJoiner

4 Likes