ENS Name Normalization

The existence of _dmarc and _domainkey subdomains would tend to indicate otherwise.

An underscore is not an uppercase hyphen.

I think @raffy might have been talking about the Arabic-Indic digits in this case, not hyphen/underscore.

1 Like

This argument doesn’t make any sense, since lots of characters which are available in ENS are not viable in traditional domain names. The whole point of this thread is to find a “middle ground” by allowing many new Unicode characters and make them viable for our web 3.0 environment. We are not bound to follow the same standard as traditional domains. The goal is to allow for creativitiy, but not let malicious people spoof others by registering malignant/confusable domain names. The current implementation by raffy already accomplishes and fixes most of this.

We are seeing that people currently are most interested in visual appealing domain names like emojis/digits/numerals (Dune), which traditionally would be mapped down to punycode, which thankfully is not needed for ENS.

So the only argument to not allow underscore domains would be that they are confusable with hyphens, but in my opinion atleast they are plenty visually different to eachother, as Nick has already pointed out. If you read the thread, there are much more appaling characters that are confusable, but in this case it’s very clear.

1 Like

I would like to add to the argument to not allow underscore on ENS domains: we want .eth domains to be accessible natively or through a gateway on browsers. Introducing underscore as an allowable character may violate browser standards(?); uninformed post, more a question than a suggestion.

1 Like

If I’m interpreting this correctly, RFC3986 allows for underscores as an unreserved character in section 2.3 in URI’s, which should mean that browsers support them in the address field.

The only thing I could find for actual domain names is that underscores aren’t allowed in hostnames but are allowed for arbitrary records (cname, TXT and so on) so they’re likely to be supported.

My personal opinion is also that underscores should be allowed. They’re significantly different characters from hyphens.

2 Likes

Agreed, sorry, I meant to reply to the above post.


This is already false for names with complex emoji and other characters. While punycode can encode all non-ASCII, almost all URL-inputs preprocess with some version of UTS-46 + IDNA 2003, which mangles.

2 Likes

Point proven

SUBDOMAINS

This is worth consideration for the new normalisation function, too. How does the new normalisation function normalise names that have already been UTS-46 normalised? In what circumstances is ENSNORM(UTS46(name)) != ENSNORM(name)?

For valid names, this holds for everything that isn’t punycode (xn--...).

I will peel off the testing part of ens-normalize.js into a separate repo to compare existing implementations against the reference and my compressed version, and include this as one of the reports.

1 Like

Hey Raffy,
I came across someone online, who developed a chrome extension, and it basically is able to verify the twitter user, and see if their posted .eth as their name, is owned by them. I think it goes to opensea, scrapes their twitter handle, and also checks if the wallet owns the .eth. Then the extension will add a badge to “verify” if the twitter is linked to opensea, and is also linked to their .eth.
Do you think this is something ENS would fund as a grant, do on their own, or would be supportive of in any way? It can also help maybe with normalization in some way down the road?
I can paste this elsewhere, but kind of wanted thoughts first to see if it even makes sense to start.

1 Like

I’m not the person to ask but your idea seems useful. You probably don’t want an OpenSea dependency and should instead resolve everything on-chain via browser wallet (Infura) or via direct fetch. (Personally, I avoid most browser extensions (as they usually demand far too many privileges and making auditing difficult) but I’m a huge fan of Tampermonkey scripts.)


I’ve split up my repos:

Once I’ve finished splitting up ens-normalize.js (possibly this weekend), I’ll run the reference and compressed implementation through the Test Suite and then I think my ENSIP is ready for consideration.

After that, finish the contract NFC implementation and confirm that it matches the reference implementation.

Edit:

3 Likes

Going back to the hyphen debate, I’m still unsure why you are looking to introduce the underscore ??

It was not used in web2 TLD’s, it has had limited use in web2 subdomains and that is it

It is in my view visually very similar to the hyphen, so really is it needed

Coca-Cola and all the other companies using a hyphen name would need to fight to get the name again to protect themselves, all at the time when we are trying to onboard people to ENS, this would also flow onto all the people who have registered hyphen names, including the recent rush to rego hyphen emoji names, all those people would have the rug pulled from under them.

Surely if you are going to add something else ASCII these would have more use case:

@ - @username.eth

= #123.eth

I do kinda get the $ addition, but I also don’t feel it’s needed with the emoji sign :heavy_dollar_sign: that already works, adding the $ will double up all the names adding to confusion, if the $ is added are you also going to do the Euro sign and the GBP sign and also the other currencies??

ENS is flooded with emoji names now, plenty different options, why not just leave the ASCII characters as they are and were in web2

If you keep changing the rules, nobody is going to trust ENS, how are you meant to onboard people if there is no trust ???

I don’t think it’s visually similar at all, they are two very distinct characters. And lots of characters are “similar”, that’s what we’ve all learned going on this normalization journey with @raffy. I don’t think that’s a good reason to say “really is it needed” though.

ENS already allows all kinds of characters that are not used (or maybe not even valid) in web2 domains, and that’s okay. When ENS started it was centered around “domains”, but I think now it is seen through the broader lens of “profiles”, not just website domains.

So sure, why not allow underscores. I agree with Nick that hyphen and underscore are clearly distinct, and one should not be mapped to another.

The question is why do new characters need to be introduced, if you keep doing this you will continue to lose the trust of the people buying them. will the attitude be “What is coming next…is someone going to be able to copy my name??”

@raffy did agree that it should be mapped along with any other dash / minus / looking character

" I think mapping to hyphen is reasonable as well. "

Yes you can tell the difference, but as they are on the same physical key on a keyboard mistakes will be made by people, but again why is it needed, why are the rules changing so much, what is next?? !@#$%^&*()+=:"’;<>?/~`| one of these or all of them??

If underscores were used in web2, then why when I go to GoDaddy and make up a random name with an underscore, it shows zero results, do the same with a hyphen and it comes up

Why not leave the ASCII characters as they were in web2 Letter, Number & Hyphens only

Leave the fun for the other characters & emoji’s

This will help onboard people as it’s the system they know and trust

The people here already are not the main market, the main market hasn’t even heard of ENS yet

I don’t think there’s been a single normalization change since EIP-137 so no trust has been lost from “rules changing”. IMO it’s the opposite: there are serious issues with nonstandard normalization, weird emoji, zero-width characters, and various confusables.

My ENSIP only enables (2) previously disallowed characters, "$" and "_". They’re both collateral damage of the STD3 flag and legacy DNS rules.

  • "$" is the only disabled currency symbol under IDNA 2003.
  • "_" is actively used in DNS records.

ASCII are the most valuable ENS characters. They’re universally supported, recognized, and easy to type. IMO, we should enable as many as possible, and $ and _ seem like good candidates, whereas "@" or "#" seem like a huge mistake since they have established delimiter-like uses.

Above ASCII, there are thousands of characters that never should of been enabled, but we’re stuck with them.

We can improve trust by (1) standardizing this process (re: my ENSIP), (2) deploy an on-chain normalization contract that follows the standard, and finally (3) deploy an on-chain validation contract that asserts if a name is unambiguous.

My investigations have shown that (3) is hard, but it’s almost trivial to provide confusable-free validation for almost all names (DNS + Emoji is 95%+ coverage). By separating normalization from validation, the remaining exotic names will still be able to normalize and resolve (but ideally there would be some feedback that they fail validation during user-input.)

2 Likes

I totally get why you are sorting some stuff out, it’s been needed as it was opened up far too much in the past without being fully thought through, hence all the problems with ZWJ etc etc

Things need to be tidied up

I just chose # and @ as 2 random characters, it could quite easily be ! or & or * etc

I fully feel you need to standardise the process and lock in what can and can’t be used, but this is what I am on about, when are the changes going to stop, is this the only one?

The hyphen is gaining momentum every single day in its use

In the past few weeks we have seen many hyphen emoji names minted (I hold zero), people are realising that web2 user names with the old rules are no longer needed with SIWE and can include hyphens, this is only going to amplify the use even more, but now the proposal is that these people and companies using a hyphen name will have to try and get the underscore name to save a copy cat user

If it mapped to the hyphen name then great, it would save a lot of hassle and cost to these people, though for Coca-Cola or G-StarRAW or Y-3 it wouldn’t be too much money in the grand scale of things

I’m guessing it will all be packaged in one vote by the DAO, so it will be this is what we have decided is best, do you agree, yes / no

There’s only 127 ASCII. I think this can be decided and frozen.

  • "?", "&" search string param separator
  • "/" path separator
  • "\", "%" escape characters
  • """, "'" quotes
  • "()[]{}" brackets (markdown syntax, etc.) → "(raffy.eth)"
  • "*" DNS wildcard
  • "|" is a "!"-confusable
  • ",;:" is a "."-confusable (the most important character)
  • "!" maybe? likely "."-confusable
  • "+", "~" maybe? likely"-"-confusable
  • rest are control characters

If we dislike "_", it should stay disabled.

2 Likes

One question I have about the underscore is, has anyone actually come forward and asked for it to be included ??

If not then it shows that it is not needed in my view

I’m guessing someone has come forward and asked for $ to be added

100%

1 Like

This sounds very non-sensical to me.

I don’t see any issue on introducing the “_”
just like I don’t see any issue with $ and other currency symbols. Once decided it can be “locked” and that fear of never ending ASCII would stop.

I might be being cynical but it sounds like you might own some “-” hyphen ENS and want to protect yourself from the “_” version.

You are right with one things. Just the point of the iceberg know about ENS. And I can guarantee it unlike Web 2. This wont be as brand centred, quite the opposite in fact.

2 Likes