ENS Name Normalization

raffy · December 21, 2021, 10:52pm

Edit I guess the goal was to have a framework that does it the correct way, and then try to relax it so it fits as many of the registered names as possible.

I thought the keycaps could be fixed using my method, but since some are already registered, it has to use the unqualified form. Both the * and # keycaps can use the FE0F.

Edit 2: Let me explain that a bit more and summarize:

If we use UTS-51 + IDNA2008, my suggestion would be anytime ENS wants to enable new emoji (ie. Unicode updates that add more emoji), the new characters should all normalize with FE0F attached when applicable. Both to preserve the intention (this an emoji) and avoid unqualified representations. (According to the spec, all emoji keyboards produce fully-qualified emoji.)

If there’s no FE0F, it goes though IDNA2008 is mapped or destroyed.

Since names have already been registered under IDNA2003 rules, emoji that were not mapped by IDNA2003 had their FE0F removed because it was ignored. Keycaps were lucky because they’re 3 characters and 20E3 was not removed, so they can still be “detected”.

This means FE0F is optional for some inputs, and results in names that can freely mix between emoji and text. This is mostly an historical accident, and maybe it’s good w/r/t homographic attacks, but most of the characters didn’t get this treatment.

tmtmtm === ™™™ === ™️™️™️ === tm™️™
111 === 1️1️1️ but =!= 1️⃣1️⃣1️⃣
mmm === ⓂⓂⓂ === Ⓜ️Ⓜ️Ⓜ️ === mⓂⓂ️ but =!= 🅜🅜🅜 =!= 🅼🅼🅼

Some emoji like ⁉️ are disallowed in both versions of IDNA but are valid emoji. These can safely be enabled like new emoji, where FE0F is used.

My grandfather suggestion was that any emoji that’s not in a registered name, should also use FE0F going forward, treating it effectively like a new emoji. Implementation-wise, it’s simple: there’s 2 lists, the single character emoji set where FE0F is optional and everything else.

Edit 3: ❶ =!= ➊ (serif vs san-serif), ֍ =!= ֎ (orientation)