It’s unfortunate that Metamask/Opensea/etc have built their own ENS “validation” implementations (and I agree with @serenae and @aox.eth that the Metamask “confusables” UI has gone too far). This should speak to a clear need for ENS to develop and maintain improved client packages/libraries, and actively encourage their widespread use.
As a “front-end developer” working on ENS integration, I would want the following API:
process(input) -> {
"normalized": [ // for each label
{"label": <uts46 normalized label>, "hash": <label hash>}
],
"nameHash": 0x123....,
"display": <concatenated normalized field, but emoji are shown in full-qualified form>,
"warnings": [
"mixed char sets", // Maybe ascii+emoji is ok?
"extraneous invisible chars",
"right-to-left chars",
...
],
"info": [ // for each unicode char:
{"unicode": <U+XXX>, "charset": <unicode charset>, "confusable": <bool>},
...
]
}
- Normalized field should only be used for internal ENS lookups, and should never be shown to end-users. I think it’s important that the protocol-level normalization is as simple as possible; it’s helpful that it can be implemented in only a few dozen lines of code.
- Display field is designed for display to end-users, and could include @raffy’s emoji processing logic (among other potentially useful things).
- Warnings signal likely scam attempts, and should always be prominently shown to the end-user. We can debate about what should and shouldn’t go in here, but I think the goal is that “good faith” registrations should never show any warnings.
- Implementors can decide whether exposing “info” makes sense for their use-case. I personally think “confusables” are more of a typography issue, but if they’re a hard Metamask requirement, that’s probably good enough reason to include them.
I’m probably missing some things, but I think this general approach would solve most of the problems discussed in this thread (including the ZWJ issue, and the emoji issue), without the need for drastic changes to low-level normalization procedures or on-chain storage. It’s also fairly amenable to change without breaking the API.