ENS Name Normalization

I know what you meant, it was just a mistake

And from someone who I’m guessing knows how to program

But it shows how easy it is to do……

Point proven……

In my view ENS should be made for the masses, not just people who sit in front of a computer every day, all day working

You had already edited it 2 times before my screenshot but hadn’t picked up on that mistake

1 Like

Do you believe your ENSIP is ready to be advanced for standardisation?

Do you have an updated report of names that are invalidated or change their normalisation as a result of the new standard? I can send you a new list of registered names if need be.

1 Like

Transposition is terrifyingly easy, I absolutely agree 100%!

That’s why I think popularizing ENS = Ethereum address as an absolute is dangerous. I’ve sent things to the wrong address from unchecksummed OCR before (meaning it was all lowercase in the address). It’s not good enough just to have an address, you need the underlying checksummed Ethereum address always printed as a crosscheck. Both to validate that the dapp developer didn’t fat finger something themselves, and to check you didn’t do something wrong.

Whatever the result of normalization now or in the future, we need better filters and alerts to counterfeit names. In Arabic numbers it’s a real problem in marketplaces because the default alert is meaningless. For most numbers, Urdu, Kurdish, and Farsi are written exactly the same as Arabic. This is leading to scams. There should probably be established language group flags output.

Edit: And in programming, every single developer will be very suspicious if their code compiles the first time. It’s scary when that happens. Then you ask yourself what you did wrong to make it work first try. It’s much more reassuring when the compiler yells at you for a missing semicolon or extra bracket a couple times. :slight_smile:

2 Likes

The changes I’m proposing (which ens-normalize.js, norm-ref-impl.js, and the resolver follow) are outlined in my ENSIP draft. They’re are actually pretty minimal from what I was originally proposing (invaliding ~2% of names).

Report: ens_normalize_1.5.0_vs_eth-ens-namehash_2.0.15 | Directory of Tests

I have an updated list but I might be missing some.

Pretty much. Give me this evening to review the language in the ENSIP.


The latest ens_tokenize has the NFC Quick Check logic that only runs NFC on the minimal parts of the string. That’s the only piece of the solidity contract was missing to be a 100% match.

4 Likes

Only $ sign to be normalized? what about the other currencies £¥€₿?

They were already valid in IDNA 2003 / EIP-137. They aren’t valid in IDNA 2008 (which is why there is confusion.)

Validation will happen after normalization. The underlying idea is simple: each label gets checked to see if it’s safe for use. Potentially there is a full name check.

It’s hard to state exactly what names are safe but it’s easy to chip away at it. I think everyone agrees DNS labels are safe. So we can write a validator which returns true for names composed of DNS labels. From my calculations above, that’s 95% of names.

Each validator is simple to write because it doesn’t need to process emoji, it only deals with normalized characters, and most validators are single-script. There also is an efficient way to determine which validators could apply.

There is a question of what to do with this information.

  • For contracts or headless scenarios, you either just normalize (throw on invalid, allow unsafe) or normalize with the ENS recommendation (throw on invalid or unsafe)
  • For user input, like text fields, unsafe names should show a warning, but the name should still work if the user acknowledges. This requires a change to the ENS recommendation and various applications.
  • A power-user feature might be to limit the accepted charsets: maybe you want to be more strict than the ENS recommendation and warn on everything that isn’t composed of DNS labels. This could occur at the user-input level (“I want metamask to warn non-DNS names”) or at the application level (*.cb.id might just be strict DNS.)

I see that there are domains with capital letters registered in this list. one being the same as an other $sbux and $SBUX. just to let you know.

That is not an error, that just means someone has manually registered the capital characters against the smart contracts. The name with uppercase characters will just normalize to the correct name with lowercase characters in all client wallets/sites.And that is true today as well, even before any of this updated normalization code goes into effect.

1 Like

ok. thank you. so does that mean two wallets will own the same domain?

No, it means that if you use “NAME.eth” in Metamask, it’s actually going to send to “name.eth”.

So you can register “NAME.eth” manually if you want, but it’ll be essentially useless to you.

1 Like

ok i see. thx for the info :slight_smile:

1 Like

Okay, I’m happy with it. I’m open to any suggestions or recommendations.


I made the additional updates to DNS feature in the resolver. It should correctly tell you if:

  1. Verbatim
  2. Invalid (non-DNS ASCII, punycode literal with only ASCII, or fails CheckHyphens)
  3. Punycode Required (browser will mangle, must pre-encode)
  4. Transforms to Punycode (doesn’t get mangled)

image

Surprisingly, xn--💩.eth is actually valid if you pre-encode it: xn--xn---yv63c.eth

2 Likes

Can you explain the namesash error? Does it mean the illegal character has to be transformed into punycode in DNS?

1 Like

When/how was it last updated? If it’s no more than a couple of weeks old, that should give us a good idea.

Just to revive the underscore issue - what do you think of only permitting it as the first character? I know that would be a deviation from the rest of the function, which is position-independent, but it would allow service domains etc, without allowing it in arbitrary positions.

Edit: I see a lot of arabic numeral names in the diff-norm list. Is this due to there being two versions of certain characters in different alphabets? Do you know how often both normalisations are registered?

I assume you’re talking about this report? The formatting isn’t the best “eth-ens-namehash-error” means the error only occurs in “eth-ens-namehash”, which is the official implementation.


Few days ago, 1.4M names.

This seems reasonable to me. First single character? Or can there be multiple underscores?

I think this was the tail of the prior discussion. Those characters are exact duplicates and can’t be fixed during the validation phase with script-based logic. The ContextO solution (prevent mixing) can’t disambiguate the pure digit cases.

1834 valid-registered that would now be unreachable (0 invalid)
701 collisions (some more than twice)
JSON
image
:smile:

1 Like

Right. I was looking at the diff-norm part of the report, which has a lot of entries like ۰١۲.

So ens-namehash-error means the name is not a valid user-facing domain correct?

That report is ens_normalize_1.5.0 (my version) vs eth-ens-namehash_2.0.15 (live)

  • eth-ens-namehash-error means it currently fails but it’s valid in my version
  • ens_normalize-error means it’s currently valid but fails in my version
  • diff-norm means they both are valid, but the two algorithms disagree
  • both-error means they both fail
2 Likes

Hey @raffy how long do clients usually take to integrate the updates after release? Say for example OpenSea (assuming it’s using the provided normalisation and not an in-house implementation) - how long does it usually take for them to implement the update? I’m assuming MetaMask will take the lead and implement it asap? Thank you.

2 Likes

Allowing multiple leading underscores seems okay.

:+1:

1 Like