ENS Name Normalization

royalfork · December 22, 2021, 4:18am

It might be helpful to divide this problem into 2 separate components:

Normalization: Technical details for converting a domain into a labelhash.
User experience: What a UI shows the user during ENS interactions.

I consider this distinction similar to the one between UTS#46 (protocol level normalization) and Internationalized Domain Names (IDN) in Google Chrome (user experience guidelines). The normalization piece is largely fixed and immutable, while the user experience can be tweaked depending on the application and/or context.

What do you think about breaking this out into 2 separate goals?

Definition of a strict and precisely defined protocol which converts a unicode string into a labelhash. The protocol should be as simple as possible, and any changes should be backwards compatible.
Recommend standard usability guidelines across platforms (these guidelines can exist on a spectrum depending on the application, and would be amenable to change in the future).

Additionally, I think the “holy grail” for normalization would be an on-chain normalization implementation. Even if it’s only economical with eth_call, and even if it doesn’t completely implement UTS46/IDNA2008; it would allow for an unambiguous version of ENS normalization. If we’re taking the time to firm up normalization requirements, I think we should consider whether this is actually feasible.

From EIP137:

<domain> ::= <label> | <domain> "." <label>
<label> ::= any valid string label per [UTS46](https://unicode.org/reports/tr46/)

I interpret this to mean that . shouldn’t be part of any string that is UTS46 normalized, so UTS46 stop rules would not apply.

Careful that we don’t fall into a false comparison here. The alternative isn’t to break existing names; it’s also possible to just ignore NV8 (the status quo). In my estimation, the proposed “branched normalization” procedure carries a steep cost (almost impractically steep, IMO):

doubles the complexity of the ENS normalization process (emoji rules + everything else rules)
doubles the complexity of the emoji normalization/validation process (old + new registrations)
requires every client maintain a list of old registrations

With stated benefits of:

Normalized emoji are fully-qualified
ContextJ doesn’t break emoji
Compliance with NV8 (does this convey any practical benefit?)
Small handful of previously disallowed emoji are now allowed.

I think these same benefits can be achieved more surgically with a few choice additions/deletions of individual IDNA2008 rules and some additional “user display” logic. I’m not completely against your “branched normalization” approach (I am a self-described ENS/emoji enthusiast, after all ), but I’d be very cautious. The devil is in the details, which I don’t think are fully fleshed out yet.