ENS Name Normalization

open · August 14, 2022, 6:46pm

Just because you don’t think it’s confusing doesn’t mean other people don’t

It is known within the realm of DNS domains that people have in the past mistaken the two characters. This is not something made up. This is the why the underscore is being specifically addressed

9gag.eth · August 14, 2022, 7:07pm

I’ve seen people confuse an I with an L but i’ve never seen someone confuse a hyphen with a underscore.

Never have I seen someone type out a phone number like 555_555_5555

Theth.eth · August 14, 2022, 7:13pm

In my line of work I have seen plenty of people use a hyphen or underscore wrong, even asking what is the hyphen key on the keyboard, is it this one or that one

100% it is a confusable

People around a computer every day won’t make mistakes usually, if you want ENS to take off it will take more than those already here and around a computer day in day out, these people that will make ENS work are the ones that will make the mistakes

It’s such an obvious mistake, that is why I have been mentioning it so much

Wildfire · August 14, 2022, 7:16pm

Pretty much “no” to everything you’ve said.

Nothing you said makes a compelling case for underscores being an issue other than you think they are an issue.

_ is to -
what i is to l
or a is to @
or u is to v

I can see you sound concerned, but I cant see where exactly it becomes painfully obvious of the dangers it poses to the users, or the outrage it will cause.

Btw If you say the same thing (and I do mean the same) all over again repeatedly at what point do you start considering it spammy behaviour ?

open · August 14, 2022, 7:29pm

How do you not see the hypocrisy of this comment?

Those characters look different.

A hyphen and underscore literally look the same.

Theth.eth · August 14, 2022, 7:30pm

They are also on the same keyboard key on a physical keyboard, plus on an iPhone it’s in the same place

Wildfire · August 14, 2022, 7:38pm

I’m sure everyone that has read the thread have duly noted your concerns. You can rest assured the highly technical and prepared people behind ENS understand where you stand.

I’m confident everyone would change their minds (or maybe already have) if there is a compelling enough argument about the underscores representing a security risk and needing to map to hyphens. I myself and several others don’t seem to buy into the idea. But in the grand scheme of things my opinion is quite irrelevant.

At the end of the day the people behind the ENS-IP which Id bet an arm are several orders of magnitude more qualified to make the final assessment are going to present the final draft with their recommendations.

The DAO is going to vote it and that’s that.

This thread is getting caught up in a distractive situation that doesn’t feel its going to advance in any direction. I myself will stop replying to anything underscore / hyphen related as I believe doing so removes more value from this thread than it provides.

Notsmol · August 14, 2022, 7:39pm

Fr you telling my - and _ is the same ? , this getting out of hands.
Most weird part is , what’s the motivation you talk since months about this ? Like you could say the same about: ‘’ “ (apostrophe and quotation marks). It’s kinda obvious why you guys focus on the underscore hyphen debate.

Theth.eth · August 14, 2022, 7:47pm

In your example you are using characters that are already in use, so yes they are confusables, but nothing can be done now

The @ sign is not going to be issued so again is a non argument

What we are talking about here is a confusable that can be fixed now before it is allowed

There is a major difference

9gag.eth · August 14, 2022, 7:50pm

It’s best that we stop discussing this minor issue as it distracts from larger issues those who are actually working on the code and the normalization update are trying to solve.

Wildfire · August 14, 2022, 7:52pm

This thread probably needs a little moderation.

To keep this clean and In the best interest of those that want to track progress of normalization without having to unnecessarily get caught on the nitty gritty (and likely less relevant subjects) I suggest @Theth.eth (if you need to further pursue this) that you open an entire new thread about underscores and hyphen to debate and make your points there.

nick.eth · August 14, 2022, 9:46pm

The underscore-vs-hyphen ‘discussion’ is closed. Any further replies discussing it will be deleted so they don’t drown out discussion about the normalisation function. Please take any further underscore-discussion to DMs or another thread.

Ronald · August 15, 2022, 12:43am

Raffy, can you clarify what will be valid? There is confusion on twitter.

a.eth (two underscores), a.eth (trailing underscore), a.eth (double trailing underscore) => are these valid?

The same for --a-.eth (two hyphen), -a-.eth (trailing hyphen), a-.eth (double trailing hyphen)

raffy · August 15, 2022, 3:22am

[Repeating/summarizing some stuff from earlier posts. This is just my opinion.]

I think everything that normalizes according to my ENSIP is legal. This is maximum compatibility with the previous spec and IDNA 2003 (and punycode transform), fixes ZWJ and emoji, can be implemented in ~30 lines of code, and decouples the implementer from needing to parse the Unicode spec (just use 2 data files referenced in my ENSIP).

The only future change to the spec should be adopting new Unicode characters (and potentially adding mappings for things that can’t be fixed through validation, like Arabic Numerals or Hyphens.)

I am fully aware there are valid names that are insane/unsafe but I think this was the best compromise w/o getting into the weeds (imagine the hyphen debate above but for hundreds of characters.)

The next thing that I wanted to do—I’m not sure if others agree—is validation/safety/whatever (I’m not sure what to call it). I would like some kind of UX for unsafe names (warning, etc.) Validation is simply checking if a normalized name is safe. Since all colored emoji are safe? (a potential question for community), validation only needs to address the textual parts of a name. Most validators should be single script or minimal script combinations (X+Latin+Common). There are some characters (Common script) that can likely be mixed with any script, which include non-colored emoji and pictographs.

IMO, one should always be able to ignore any validation and use any name that normalizes according to the spec. Validation is primarily for user-input. I envision a contract function that only rejects invalid names, and another function that rejects invalid AND unsafe names.

The easiest validator is DNS/Basic which covers most names. The next validator would be Latin (which requires some debate about confusables). Once you have that, you can add all the languages that are X+Latin (more debate about conufsables). Additionally, there would be Basic-equivalents in other scripts (possibly using Unicode exemplars).

I don’t think all validators need to be same level of “safe” and this potentially could be a feature. An earlier idea was that each validator should have a ranking. For example, an advanced user, wallet, or application could specify the set of acceptable validators.

There’s nothing wrong with validator overlap either. For example, the Basic validator is (likely) a subset of the Latin validator. And the Latin validator is (almost) a subset of the Japn+Latin validator. However, as you increase validator coverage, you increase confusability.

Specifically for the Basic validator, we discussed having a restriction on underscore (leading only?) and hyphens (repeated?). Personally, I think the Basic validator should imply maximum safety (under the assumption that ASCII is unique and its confusables like "1lI" are grandfathered-in.) I think something like 0-2 leading underscores and 2-or-fewer adjacent dashes would fit this criteria. However, I don’t see any reason why we can’t have another validator that allows more underscores or hyphens, I just wouldn’t consider it Basic.

nick.eth · August 15, 2022, 4:03am

This seems reasonable. My concern with leaving things like underscore and hyphen placement and repetition to the validation layer is that people will immediately register anything that’s valid, and then complain when it’s later regarded as unsafe. A more restrictive rule for valid names would help prevent this.

Ronald · August 15, 2022, 10:43am

In your validator it’s not clear names that validate and are ‘Pretty’ for ENS also have to pass the DNS validation. I assume that’s what you are saying, and if so I am also confused.

@Nick.eth, what do you mean by unsafe? Do you mean a domain that maps to another?

raffy · August 15, 2022, 2:46pm

In my demo:

Pretty/Beautify means the name contains FE0F in the correct locations. It has no impact on normalization or validation and is just a convenience for having fully-qualified emoji for display.
The following is true for all names:
ens_normalize(ens_beautify(x)) == ens_normalize(x)
DNS information is also included to indicate how a name interacts with DNS. This also should have no impact on normalization or validation. My suggestion is that similar information should be available during registration or via the metadata service.

Using the above example, the Basic validator would be a function that tests if each label (ignoring emoji) is a-z0-9, starts with up to 2 underscores, and every sequence of hyphens is at most 2. If every label passes this criteria, I would consider the name safe. I envision a bunch of validators to account for all the different language/scripts. If a name doesn’t pass any validator, I think we should show a warning that the name should get extra attention, but allow the user to acknowledge and continue resolving the name if desired.

wolfram · August 17, 2022, 9:03am

This is trivial but I would just like to mention _ names will bork the user’s account page in the app which may prevent users from managing accessing/managing domains especially bulk extends.

cthulu.eth · August 17, 2022, 10:04am

I believe that this is already solved in the new manager app that’s rolling out. You can test it out at https://alpha.ens.domains

In the meantime you can solve the error above in the old manager app by temporarily transferring the ENS names that’s breaking it out to a different wallet. I created a github issue for this a while back, but focus at the moment is on the new manager app.

rayw · August 18, 2022, 8:40pm

Hello, I have been following this thread for about a month now. I am aware that leading underscore domains are on the agenda of things to normalize and validate.

The questions I am trying to tackle here are:

Are hyphens going to be allowed as leading and last characters? Why should they? Why should they not be?

I am answering these questions based on my subjective view of aesthetic and a few technical aspects. From what I understand, nick has expressed that these changes in what will be normalized and validated are being made with the goal of increasing adaptability and creating a better experience for the long-term of ENS (which is what I think we all want). For me, more options for people = better experience (as long as these options don’t create too much confusion or have many issues).

If underscores are only allowed as leading characters, I don’t see exactly why allowing hyphens as last characters or even leading characters is completely necessary/logical. (hyphens are currently allowed only as middle characters for ENS, similar to .com domains)

My reasoning:

Firstly, I would like to point out that I think there is more utility with both the hyphen and underscore being allowed at the front, since it is indeed way more aesthetic than when they are last characters (for example : -rayway.eth or rayway.eth VS rayway-.eth or rayway.eth). I think we can all agree that both the hyphen or underscore next to the period makes for a strange/confusing looking sequence of characters.

Secondly, assuming the point of making these changes was mostly to create more adaptability, I find it hard to understand how allowing underscores only at the front really makes much of a difference in terms of this goal: people won’t even be able to have the same exact name as their twitter or instagram, which both support underscores anywhere in the name. Keep in mind, I find that the underscores are still a good addition even if they are only allowed as leading characters. This point is mostly important for my next one concerning leading and final character hyphens.

Bouncing off this principle of attempting to allow what instagram and twitter supports in their names (to increase ENS adaptability), I feel like it is important to consider that instagram and twitter don’t support hyphens in names at all. This point seems relevant to me because .com also does not support hyphens as leading or final characters. Additionally, google does not support searches of any string of characters with a leading hyphen (an error comes up that says “your search did not match any documents.” (see image below)

I find that this creates a major issue for leading hyphen domains (if they are introduced) because you can’t search someone’s wallet through google (this affects the utility/adaptability of the leading hyphen domains in comparison to others). I still find leading hyphens to be appealing aesthetically as I mentioned above but all of these points, along with the fact that last character hyphens look strange, do not really align with the overall goal of increasing adaptability.

Honestly, I don’t really mind what is done at the end of the day. I’m just presenting you with my opinion on some things that I find are inconsistencies. I just want ENS to be the best it can be for the long-term. Thank you for reading what I had to say.

Thank you for your hard work. Cordially, rayway.eth