ENS Name Normalization

So ens-namehash-error means the name is not a valid user-facing domain correct?

That report is ens_normalize_1.5.0 (my version) vs eth-ens-namehash_2.0.15 (live)

  • eth-ens-namehash-error means it currently fails but it’s valid in my version
  • ens_normalize-error means it’s currently valid but fails in my version
  • diff-norm means they both are valid, but the two algorithms disagree
  • both-error means they both fail
2 Likes

Hey @raffy how long do clients usually take to integrate the updates after release? Say for example OpenSea (assuming it’s using the provided normalisation and not an in-house implementation) - how long does it usually take for them to implement the update? I’m assuming MetaMask will take the lead and implement it asap? Thank you.

2 Likes

Allowing multiple leading underscores seems okay.

:+1:

1 Like

Do we have an ETA for normalization update?
When would we be able to register new ENS with underscores directly from ENS app?

2 Likes

Unsure. The JS code is incredibly easy to audit or re-implement yourself. The ref-impl and ens-normalize share the same normalization loop that directly follows the ENSIP processing section. I know there’s a Go implementation. I’m not sure how many other implementations exist. The smart contract route might also be best for user-facing applications.


After thinking about this, it might be better to enforce this through validation.


I computed some additional stats for the 1.4M names:

  • There are 1320 collisions JSON
    • 461 trivial (just casing permutation)
    • 669 pure arabic numerals
    • 160 non-trivial (everything else) which looks like hyphens and illegal emoji
  • Only 4.7% (66819 of 1410818 unique-valid names) are non-basic (/^[a-z0-9.-]+$/ + Valid Emoji) JSON
    • 66808 if basic includes leading hyphen
    • 54927 if basic includes anywhere hyphen
  • Only 4.2% (60584) if I include single-character text-presentation emoji (non-colored) too
    • Even less if you include the pictographs (non-colored non-emoji)
  • The current smart contract incorrectly fails on 197 (0.014%), of which 196 are supposed to be valid. Once NFC is fully implemented, that will be 0 (100% match with reference implementation.)
4 Likes

They are both valid for normalization.

  • ฿ [E3F] is a Thai Common symbol.
  • ₿ [20BF] is a Common Symbol.

edit: oops, they’re both in Common. I think they both should normalize but it’s unclear if they’re both safe. The community will have to decide if these or others are confusable and result in an unsafe label. My general feeling is that currency symbols aren’t confusing.

For validation, a validator will determine if this is a confusable based on all the characters in the label, ignoring colored emoji.

The trivial Thai validator is just Thai only.
The smarter Thai validator is probably Thai + Latin + Common - Confusables.

The official confusables restricted to those charsets for Thai looks like this, where the non-green cells require a decision.

1 Like

:joy: :joy: :joy:

Confusables aren’t addressed by my normalization proposal. I encountered too many edge cases while trying to develop a complete confusable-free solution.

My suggestion is that we have hard errors for names (normalization) that use illegal constructions (disallowed characters, illegal emoji, invisible characters, etc.) and soft errors for names that are unsafe/confusable (validation).

This allows the normalization spec to standardize and “unsafe” names still work. We can expand the universe of safe names until nearly all reasonable names are covered. We can start with Alphanumeric ASCII + colored emoji which I claim are safe. If we follow the distribution of registered names, it should be easy to hit 99%+ coverage.


There are a large set of single characters (~2K) that consist of default text-presentation emoji (, eg. those that appear uncolored) and non-emoji pictographs (☏️). There’s probably a set of these that are safe to use in any name like colored emoji. However, some of these aren’t unique and require a decision: ❤ [2764] vs ♥ [2665]. On Mac, it appears that some of these already buck the Unicode convention and appear colored (eg. ) whereas ↖ [2196] vs ↖️ [2196 FE0F] does not. Determining which of these are safe covers another 1% of names.

4 Likes

Is there any chance I’ll be getting a refund for these two ens names I haven’t been able to sell or do anything with, or even renue? :trinidad_tobago:🇹.eth and 🇮🇮🇮.eth both done in all emojis. I think it has to do with the first two T’s turning into the flag

Still think you are making a mistake with the hyphen and underscore as I think they are confusables and that is the reason the underscore was stopped in web2, but we have minted out the ‘-#-‘ & ‘#’ names etc etc, so either just wasted some money or now have some very rare single digit names for the price of a 3 character name

Not going to keep going on about it as know my opinion doesn’t count, but I feel
It will come back to bite ENS in the ass in the future

Edit:

Underscores still not working, even in quotation marks

1 Like

I agree with you that symbols that are normally little used will not be an added value in the ens ecosystem, symbols like: [] >/*+" etc. but hypens and underscores are the mostly used characters in general. For example in usernames and do clearly add value to ens domains/ecosystem. Why are people saying that they are confusable? I mean everyone can see that this _ is een underscore and this - a hyphen. This will 100 percent help with the adoption of ENS. I think that it will be one of the main challenges at the moment/future is adoption and with the use of hypens and underscore we just lower the threshold for newcomers.

1 Like

People keep falling for approval scams in fake NFT airdrops still. Catering to the lowest common denominator isn’t wise.

This is complicated, and it’s going to remain complicated. Web 2.0 just used ASCII. That’s a nice simple solution for 1984. They did great, and Unicode didn’t even get invented until some years later. What would Web 2.0 be now if Unicode came before domain names?

It took nearly 40 years to stop discriminating against non-English speaking people on the internet with DNS. The ENS is governed in English. I think we should be cognizant of the fact we might actually wield great power over the future of internet usage. Anglo-centrism has made the world a smaller place in some ways, but it is silly to continue on with many conventions that limit us as humans. Confusion is just going to be part of life now. We will just have to deal with getting more knowledgable about how society works.

The people falling victim to elementary scams is not a nice thing to see, but we don’t restrict who you can dial in the telephone system just because elderly people get targeted by scams of confusion, manipulation, and misrepresentation.

1 Like

Web2 didn’t just use ASCII

There are emojis in web2, even emoji .com domains

Hi peeps, just joined the convo here.

Just looked into your domain names, and it seems that you just want to cut off the competition to save your own bag (domain names) since you own lots of names/numbers with hypens. I own zero hyphens and zero underscores domains but my usernames on different platforms includes underscores and hyphens and I see no point in confusing them. Maybe in the future I will create some but I’m good for now with my 999club domains.

I’ve hedged either way, so what ever way the cookie crumbles it’ll be fine for me

Please guys, none of this. It’s ad hominem because you’re not arguing anything on technical merit, instead you’re seeking to discredit someone purely based on the domains they hold.

3 Likes

Cheers, I wasn’t going to say anything, I am used to it from them, also a cheap way to pump their club at the end

I actually laugh at things like this now, shows they are are worried about their bags

Since 2003, but it doesn’t mean registrars allowed it or absolutely anything was consistent. For 10 years or so allegedly there have been IDNs, but nobody uses them. I would be hesitant to say it was bungled because when something is standardized it is better to keep it that way, and restricting them to specific country based TLDs.

Please guys, none of this. It’s ad hominem because you’re not arguing anything on technical merit, instead you’re seeking to discredit someone purely based on the domains they hold.

This is mostly a political forum, and that’s not an ad hominem as I see it. It’s perfectly fair to argue politically. In fact the main arguments for and against are not technical. There’s no technical limitation really, it’s all philosophical which comes down to politics. The only technical things involved are implementation, and compatibility issues.

We all want to stop scams, and while I disagreed with the Theth.eth’s - vs _ confusion argument, I definitely see it as valid. The effort you go to restrict domains to allow or disallow certain behaviors because of underlying empirical observations is pure politics. Motives being transparent or questioned is a key tenet of democracy. There’s also nothing wrong with having a motive of self-interest.

I’m arguing for a broader picture as many have made arguments, which in my opinion, show a bit too narrow of a worldview. Not everyone is a gamer or gets 90s culture in IRC/AOL/ICQ handles. Not everyone codes and even people who do don’t know every language’s conventions. There’s a few sticky places where inclusivity touches security concerns.

Not touched in the discussion was the role of + for example. Only last night while researching ways to index Web3 sites did I discover some things about - versus _ and + in some libraries. It would be nice if at least subdomains could have + as a compliment to -. Sure, you can access the text records of a domain, but as ENS domains will likely be subdomain heavy, it would be nice to have conventions that allow faster indexing with keyworded subdomains.

That is the same level of low-caliber ad hominem I warned him about. Don’t do that either please, come on.

Saying “you just want to save your own bags” is nothing but a personal attack. You are pretending to know the heart of the other person, and attacking them based on your own assumptions. Feel free to argue politically or technically, but keep personal attacks out of it.

The rest is fine, let’s just stay on topic and civil please