ENS Name Normalization

royalfork · March 31, 2022, 4:59am

It’s unfortunate that Metamask/Opensea/etc have built their own ENS “validation” implementations (and I agree with @serenae and @aox.eth that the Metamask “confusables” UI has gone too far). This should speak to a clear need for ENS to develop and maintain improved client packages/libraries, and actively encourage their widespread use.

As a “front-end developer” working on ENS integration, I would want the following API:

process(input) -> {
	"normalized": [ // for each label
		{"label": <uts46 normalized label>, "hash": <label hash>}
	],
	"nameHash": 0x123....,
	"display": <concatenated normalized field, but emoji are shown in full-qualified form>,
	"warnings": [
		"mixed char sets", // Maybe ascii+emoji is ok?
		"extraneous invisible chars",
		"right-to-left chars",
		...
	],
	"info": [ // for each unicode char:
		{"unicode": <U+XXX>, "charset": <unicode charset>, "confusable": <bool>},
		...
	]
}

Normalized field should only be used for internal ENS lookups, and should never be shown to end-users. I think it’s important that the protocol-level normalization is as simple as possible; it’s helpful that it can be implemented in only a few dozen lines of code.
Display field is designed for display to end-users, and could include @raffy’s emoji processing logic (among other potentially useful things).
Warnings signal likely scam attempts, and should always be prominently shown to the end-user. We can debate about what should and shouldn’t go in here, but I think the goal is that “good faith” registrations should never show any warnings.
Implementors can decide whether exposing “info” makes sense for their use-case. I personally think “confusables” are more of a typography issue, but if they’re a hard Metamask requirement, that’s probably good enough reason to include them.

I’m probably missing some things, but I think this general approach would solve most of the problems discussed in this thread (including the ZWJ issue, and the emoji issue), without the need for drastic changes to low-level normalization procedures or on-chain storage. It’s also fairly amenable to change without breaking the API.