ENS Name Normalization 2nd

Just to confirm, do you consider the current state of the ENSIP ready for last call and then finalization?

1 Like

Yes. It has some URLs that currently link to my repository. The only critical link would be to spec.json.

7 Likes

Hi guys, I was wondering if names like:

big𓂸.eth

were made invalid for a reason?

𓂸𓂸𓂸.eth is valid, and emoji mixes like big​:eggplant:.eth are also valid :wink:

will mixes of egyptians and text will remain permanently invalid?

Asking for a friend! Thanks!

Scripts in the Unicode Identifier “Limited Use” and “Excluded” lists are restricted and cannot mix with any other characters besides emojis. Because of endless confusable possibilities, but also because they are obsolete, not in modern use, only used liturgically, or have unresolved architectural issues that make them unsuitable for identifiers.

So 𓀀𓀁𓀂.eth is valid, 𓀀𓀁𓀂🚀.eth is valid, but 𓀀𓀁𓀂a.eth or 𓀀𓀁𓀂あ.eth or any other mixture of that script with other scripts will be invalid.

Greek / Cyrillic also cannot mix with Latin and have whole-script confusable restrictions as well. This is all laid out earlier in this thread by Raffy, but I’ll let him clarify.

4 Likes

Perfect answer, @serenae
that clarifies all my questions :heart::+1:

:dash:

bit quiet here…

@adraffy Can you please submit your ENSIP and all dependencies as a PR against the docs repo?

2 Likes

I will submit this weekend.


I added the following to the resolver demo: for each normalization, I show if eth-ens-namehash is valid, errors, or is different from my norm.

5 Likes


:fire: :fire: :fire: :fire: :fire:

1 Like

I updated the interface slightly to make the normalization differences more clear:

Surprisingly, ‑888 (2011 38 38 38) was the only registered example that isn’t normalized in both algorithms, but normalizable, however divergent: 20112D vs 2011 -> 2010.


I created an initial PR for ensdomains/docs but it requires some input and likely some additional files before it’s ready.

8 Likes

I recently saw a tweet about a spoofed ethmoji purchase, involving the US ( 🇺🇸) vs UM flags ( 🇺🇲). I think this is something we’ll need to address in the Emoji 16.0 update sometime late next year along with how to handle the directional emojis (if that gets approved.)

Unfortunately, these emoji are RGI and available on Emoji keyboards. While US vs UM might seem easy to decide as American, some of other confusable flags aren’t so clear w/r/t picking a “winner”.

At the moment, the best we can do is client-side warnings or injecting some extra information into the metadata avatar. I think the ultimate solution would be petition Unicode to fix these visually indistinguishable emoji using additional colors, subscripts, or bordering. For example, subnation flags could have an island-icon subscript and gendered emoji could have a gender-symbol subscript.

From the ENSIP:
image


I had some inquires about some additional Mathematical symbols, like 2295 (⊕) CIRCLED PLUS. I was able to keep some unique Mathematical characters, like , , , , , , but in general, I errored on the side of caution. It’s possible these could be reviewed in a future update but I think the current spec is sufficient.

1 Like

What about Norse / Rune languages and characters ? The scripting is beautiful and while not being very “human readable” - the letters and words do look appealing and would look pretty nice on etherscan !

For example:

ᚠᚢᚦᚨᚱᚲᚷ.eth ?

Runic is allowed, however since it’s an excluded script, the label must be pure (and can only mix with emoji.)

image


2 Likes

Wow that link you provided on excluded scripts get pretty deep pretty fast ! Yikes :sweat_smile:

But essentially I can still go an mint Runic or other special characters and set the reverse records if desired ?

Along with cool shapes like this:

⏣⏣⏣⏣⏣⏣.eth

I love finding and minting these unusual ones that are so different but look so appealing on or when SIWE on various apps!

Sometimes I think having domains that aren’t so much “Human Readable, but “Human noticeable” is such a hugely overlooked area by most.

I particularly love signing in to one of my Opensea accounts using an emoji domain that I set the reverse record on a while back from this wallet -

0x21b8defe3b23e6d701f407fa94dc64fe206041b3

:white_check_mark::sweat_smile::green_heart:

Keep up the good work man!

1 Like

The following 2 ports are now 100% match (across 2.7m labels) with the JS implementation.

I submit a PR for a few minor updates to the ENSIP-15.

4 Likes

Not a spec change, but I’ve made a small update to my Resolver:

  • Names that were Emoji + ASCII used to appear as Latin (because ASCII is specifically 7-bit). However, I now separate these cases and show an overall summary for the name:
    image

  • Names that are Latin but contain any of ąçęşìíîïǐł are now marked as “Potentially Confusing”. ENSIP-15 reduced the “modified ASCII space” from effectively infinite to a handful of characters, but there are still few marks that may go unnoticed:
    image
    image

This PR is still waiting for approval, although maybe it can wait for the pending Unicode update.


Unicode 15.1 will be released on September 12 (~2 weeks). I’ve built the library against the latest proposal files and ran it on all existing registrations:

  • :white_check_mark: There are 0 names that were valid that became invalid due to Unicode changes.
  • :white_check_mark: There are 234 names with Unicode 15.1 emoji that are now valid.

Changewise:

  • There are 118 new emoji, of which 108 are the new directional sequences, which flip the emoji.
  • No new scripts or CM.
  • 600+ new CJK ideographs.
  • There is (1) IDNA change that was never registered. It was actually disallowed in ENSIP-15 since I thought it should have been mapped, but now it is:
    image

Can we get the ENS Metadata Service to use the beautified result in the SVG and JSON?
Example: 9️⃣9️⃣.eth Resolver | Metadata
The <svg> rendering is the most important but "description" (and "name"?) can be changed as well IMO.
Note: modern mobile devices already force beautify system-wide.


I’ve been told Metamask is still having normalization issues. I was about to submit a PR to add ENSIP-15 support but encountered an issue with an older version of ethers. I talked to Ricmoo and the intention is to release a patch for ethers v5 with the latest ENSIP-15 support.


I recently upgraded my registration tracker which shows the most recent 1000 names under various perspectives:

2 Likes

Approved. In future if you add me as a reviewer I’ll be sure to see it sooner.

Thanks for approving the docs PR and deploying beautification in the metadata service.


I updated the ENSIP-15 spec files for Unicode 15.1.0. For implementers, there are no code changes required, just rebuild with the latest spec files.

As mentioned above, none of the Unicode changes impact any valid registered names.

New Emoji in 15.1


I’m currently looking for some clarity regarding URL parsing, as a few projects have moved to Ada’s URL parser, which follows WhatWG, which suggested CheckJoiners true, which disallows ZWJ, which makes punycoded ZWJ emoji names invalid, and thus unreachable over DNS. Clearly there’s some confusion here as the major browser vendors don’t even agree on some punycode rules. This is an implementation issue not a Unicode issue, similar to how it’s perfectly valid to use IDNA with useSTD3 false to allow underscore according to UTS-46.

3 Likes

ens-normalize-python by NameHash is also updated with Unicode 15.1.0 and pass new tests.

1 Like