ENS Name Normalization

Agree

Is this the outcome discussed and approved? I read through the most recent comments, but not exactly sure what the outcome was. Will the extended arabic (persian) digits, route to the regular arabic indic digits for the ones where there is overlap?

I believe they need to be mapped, if we want a good UX.

According to UTS-46 w/ Context O the recommendation for Arabic Numerals is never allow digit mixing. However, that still permits visually identical names for corresponding digits.

According to UAX-15 and visual inspection, 0-3,7-9 are confusable, so either you disallow those characters or pick a preferred one. The recommended solution is to convert to punycode, so the user sees a gibberish name, but now have the information necessary differentiate the confusables characters (if they know the correct punycode form.) For ENS, we don’t have an alternative input form.

  • Names should normalize or fail → How do I resolve xyz.eth?
  • Names should be valid/accepted/notconfusing or fail/warn. → Is xyz.eth a spoof?

Discussed? yes. Approved? no, that’s the purpose of this discussion!

2 Likes

Hyphen and underscore are distinct characters with distinct visual appearances. Both are already used in DNS, too, where they have different meanings. We definitely shouldn’t map them to the same character.

Thanks, everyone for the great discussion and all the work.

I noticed that addresses containing an (emoji)+(word) are not being recognized on etherscan.
When I searched for :cookie:cookie.eth the response was ( Name does not follow UTS-46 normalization.)
Don’t get me wrong. Maybe this is already being addressed but I am afraid that if we cannot make it work across different platform interfaces like etherscan and specially CEXs this could trigger huge concerns and a lack of trust in ENS.

I don’t want to sound alarmist but I don’t think the majority of people that bought emoji domains know about this.

I am here to help if you need.

1 Like

Underscores NOT allowed in DNS domain names OR sub-domains and has been that way for years

Underscores NOT allowed in DNS domain names OR sub-domains and has been that way for years

This one is a good read

Domains are not case sensitive, so why make a difference between a hyphen ‘-’ and an underscore '_'

1 Like

The existence of _dmarc and _domainkey subdomains would tend to indicate otherwise.

An underscore is not an uppercase hyphen.

I think @raffy might have been talking about the Arabic-Indic digits in this case, not hyphen/underscore.

1 Like

This argument doesn’t make any sense, since lots of characters which are available in ENS are not viable in traditional domain names. The whole point of this thread is to find a “middle ground” by allowing many new Unicode characters and make them viable for our web 3.0 environment. We are not bound to follow the same standard as traditional domains. The goal is to allow for creativitiy, but not let malicious people spoof others by registering malignant/confusable domain names. The current implementation by raffy already accomplishes and fixes most of this.

We are seeing that people currently are most interested in visual appealing domain names like emojis/digits/numerals (Dune), which traditionally would be mapped down to punycode, which thankfully is not needed for ENS.

So the only argument to not allow underscore domains would be that they are confusable with hyphens, but in my opinion atleast they are plenty visually different to eachother, as Nick has already pointed out. If you read the thread, there are much more appaling characters that are confusable, but in this case it’s very clear.

1 Like

I would like to add to the argument to not allow underscore on ENS domains: we want .eth domains to be accessible natively or through a gateway on browsers. Introducing underscore as an allowable character may violate browser standards(?); uninformed post, more a question than a suggestion.

1 Like

If I’m interpreting this correctly, RFC3986 allows for underscores as an unreserved character in section 2.3 in URI’s, which should mean that browsers support them in the address field.

The only thing I could find for actual domain names is that underscores aren’t allowed in hostnames but are allowed for arbitrary records (cname, TXT and so on) so they’re likely to be supported.

My personal opinion is also that underscores should be allowed. They’re significantly different characters from hyphens.

2 Likes

Agreed, sorry, I meant to reply to the above post.


This is already false for names with complex emoji and other characters. While punycode can encode all non-ASCII, almost all URL-inputs preprocess with some version of UTS-46 + IDNA 2003, which mangles.

2 Likes

Point proven

SUBDOMAINS

This is worth consideration for the new normalisation function, too. How does the new normalisation function normalise names that have already been UTS-46 normalised? In what circumstances is ENSNORM(UTS46(name)) != ENSNORM(name)?

For valid names, this holds for everything that isn’t punycode (xn--...).

I will peel off the testing part of ens-normalize.js into a separate repo to compare existing implementations against the reference and my compressed version, and include this as one of the reports.

1 Like

Hey Raffy,
I came across someone online, who developed a chrome extension, and it basically is able to verify the twitter user, and see if their posted .eth as their name, is owned by them. I think it goes to opensea, scrapes their twitter handle, and also checks if the wallet owns the .eth. Then the extension will add a badge to “verify” if the twitter is linked to opensea, and is also linked to their .eth.
Do you think this is something ENS would fund as a grant, do on their own, or would be supportive of in any way? It can also help maybe with normalization in some way down the road?
I can paste this elsewhere, but kind of wanted thoughts first to see if it even makes sense to start.

1 Like

I’m not the person to ask but your idea seems useful. You probably don’t want an OpenSea dependency and should instead resolve everything on-chain via browser wallet (Infura) or via direct fetch. (Personally, I avoid most browser extensions (as they usually demand far too many privileges and making auditing difficult) but I’m a huge fan of Tampermonkey scripts.)


I’ve split up my repos:

Once I’ve finished splitting up ens-normalize.js (possibly this weekend), I’ll run the reference and compressed implementation through the Test Suite and then I think my ENSIP is ready for consideration.

After that, finish the contract NFC implementation and confirm that it matches the reference implementation.

Edit:

3 Likes

Going back to the hyphen debate, I’m still unsure why you are looking to introduce the underscore ??

It was not used in web2 TLD’s, it has had limited use in web2 subdomains and that is it

It is in my view visually very similar to the hyphen, so really is it needed

Coca-Cola and all the other companies using a hyphen name would need to fight to get the name again to protect themselves, all at the time when we are trying to onboard people to ENS, this would also flow onto all the people who have registered hyphen names, including the recent rush to rego hyphen emoji names, all those people would have the rug pulled from under them.

Surely if you are going to add something else ASCII these would have more use case:

@ - @username.eth

= #123.eth

I do kinda get the $ addition, but I also don’t feel it’s needed with the emoji sign :heavy_dollar_sign: that already works, adding the $ will double up all the names adding to confusion, if the $ is added are you also going to do the Euro sign and the GBP sign and also the other currencies??

ENS is flooded with emoji names now, plenty different options, why not just leave the ASCII characters as they are and were in web2

If you keep changing the rules, nobody is going to trust ENS, how are you meant to onboard people if there is no trust ???

I don’t think it’s visually similar at all, they are two very distinct characters. And lots of characters are “similar”, that’s what we’ve all learned going on this normalization journey with @raffy. I don’t think that’s a good reason to say “really is it needed” though.

ENS already allows all kinds of characters that are not used (or maybe not even valid) in web2 domains, and that’s okay. When ENS started it was centered around “domains”, but I think now it is seen through the broader lens of “profiles”, not just website domains.

So sure, why not allow underscores. I agree with Nick that hyphen and underscore are clearly distinct, and one should not be mapped to another.

The question is why do new characters need to be introduced, if you keep doing this you will continue to lose the trust of the people buying them. will the attitude be “What is coming next…is someone going to be able to copy my name??”

@raffy did agree that it should be mapped along with any other dash / minus / looking character

" I think mapping to hyphen is reasonable as well. "

Yes you can tell the difference, but as they are on the same physical key on a keyboard mistakes will be made by people, but again why is it needed, why are the rules changing so much, what is next?? !@#$%^&*()+=:"’;<>?/~`| one of these or all of them??

If underscores were used in web2, then why when I go to GoDaddy and make up a random name with an underscore, it shows zero results, do the same with a hyphen and it comes up

Why not leave the ASCII characters as they were in web2 Letter, Number & Hyphens only

Leave the fun for the other characters & emoji’s

This will help onboard people as it’s the system they know and trust

The people here already are not the main market, the main market hasn’t even heard of ENS yet