ENS Name Normalization 2nd

raffy · January 19, 2023, 11:16pm

I know ethers explicitly disallows null labels and also disallows trailing stop.

UTS-46 VerifyDnsLength is technically false since ENS permits arbitrary length names. When false, it also allows trailing stop (which is wrong since namehash root label is 0x0 not keccak("")).

I think it should either be:

strict: require (1+)-length labels (which would deny leading/trailing/adjacent)
polite: normalize away null labels: "...a....eth == a.eth"

Edit: I will change to strict and add example code for how safely collapse null labels (which may have interleaved ignoreables.)

Ronald · January 21, 2023, 5:30am

Checksummed addresses solve the fatfinger/copy paste problem. I try to never copy paste unchecksummed addresses to a wallet. This includes ENS names. ENS names are great for simplifying addresses many times, but you need to be aware of what you are getting into.

Wallets currently have poor ENS support, as do most UXs. Relevant info like domain to address age, transaction count, and balance are important for the times when you do give up security for convenience. It’s ok to to type in a name.eth first if you can remember the address, but you would be extra sure if you could see the domain age in the UX.

nick.eth · January 23, 2023, 4:02am

Null labels should be considered invalid.

Do you have an updated report on how many domains are affected by the new normalisation function?

djstrong · January 26, 2023, 5:14pm

Hi, I don’t want to mess with the algorithm, but I am surprised that all apostrophes maps to ’ and not to '. The second one is ASCII and is easy to type (just one button on my keyboard).

raffy · January 27, 2023, 5:47am

Various breakdown reports, trivial to produce some other format if needed.

Breakdown JSON
Latest vs eth-ens-namehash (2.0.15) (Old-style Compare Report)
Latest vs UTS46 (2003) Reference Implementation

For IDNA, 27 (') APOSTROPHE is disallowed and 2019 (’) RIGHT SINGLE QUOTATION MARK is valid.

raffy · February 13, 2023, 8:08am

I had my first bug report in ens-normalize for a misnamed variable in the combining mark counting code. Fortunately, it impacts no registered names, but it indicated a missing test case: a string with both decomposable characters and excess combining marks near the end of the string. It was found by Carbon225 while developing of a Python normalization port.

Related: there are only a few names that fail due to excess combining marks (most true abuses fail earlier for different reasons, like illegal use or invalid mixtures). If anyone with experience with these remaining examples could comment about the validity of these names, it would be greatly appreciated.

The notation used below is that the name matches the group Bengali but a character was found that is followed by 3 CM, where the maximum allowable was 2 (eg. 3/2). I’m contemplating changing the CM limit to 3 for all non-CM-whitelisted groups. I’m currently using a value of 1 or 2 (see: cm:#).. Edit: the Unicode recommendation is max of 4 NSM.

I had a request to enable a different check character, likely due to the popularity of the Checks NFT. Upon further inspection, it does land in an unfortunate grey area.

2714 (✔︎✔️) heavy check mark — 2713 (✓) check mark

My convention for deciding amongst characters of different weights (very thin, thin, regular, medium, heavy, very heavy, etc.) was to choose the heavy variant if available. For checks, the heavy variant is also the emoji character.

Because normalized emoji have their FE0F stripped and check is default text-presentation (Emoji_Presentation=false), a normalized check emoji looks like a bold textual check. I made a simple demo which shows a few heavy variants with emoji forms and their corresponding most similar textual character (“alt”) . If you view this page on different browsers, operating systems, and devices (desktop/mobile), you’ll notice that it’s visually inconsistent.

I understand the desire for a textual check but the unpredictability of emoji appearance makes ✓✔︎ too similar to enable in good conscience. For example, if ✓ was ✕, we’d be having the same discussion about xX✕✖︎✖️.

If there’s anything else I can help calculate or do to facilitate DAO adoption, please let me know.

The things that would immediately benefit from the new normalization spec:

app.ens.domains: registration input, showing script of labels, beautified name, etc…
metadata: showing beautified names in the image/svg and properly assigning in marketplaces
etherscan: beautified primary names
metamask: their ENS input needs a lot of work

nick.eth · February 14, 2023, 5:59am

Does this mean you’re ready to lock the spec in and submit it as a standard?

raffy · February 21, 2023, 10:37am

For my final changes which address (2) prior concerns:

I’ve changed to NSM counting with a maximum of 4 unique characters for all non-CM-whitelisted script groups, like the Unicode security suggestion. This works much better than I expected. The Breakdown report has changed from cm → nsm.html and there are now only 4 exceptions.

I didn’t get much input on these characters. According to the Breakdown reports, usage is very low. To error on the safe side, I’ve decided to disallowed these characters instead so they may be revisited at a future date (whereas mapping would leave them permanently unusable.)

I’m happy with version 1.9.0. I will update my ENSIP with these final changes ASAP.

raffy · February 26, 2023, 11:19pm

ENSIP
Breakdown Report for 1.9.0
ens-normalize.js (1.9.0) vs eth-ens-namehash (2.0.15) (Old-style Compare Report)
In-browser validation test
100% of known names have property ens_normalize(UTS46(x)) == ens_normalize(x)

nick.eth · February 27, 2023, 5:09am

Just to confirm, do you consider the current state of the ENSIP ready for last call and then finalization?

raffy · February 27, 2023, 7:46am

Yes. It has some URLs that currently link to my repository. The only critical link would be to spec.json.

mattgarcia.eth · March 7, 2023, 10:56am

Hi guys, I was wondering if names like:

big𓂸.eth

were made invalid for a reason?

𓂸𓂸𓂸.eth is valid, and emoji mixes like big:eggplant:.eth are also valid

will mixes of egyptians and text will remain permanently invalid?

Asking for a friend! Thanks!

serenae · March 7, 2023, 2:50pm

Scripts in the Unicode Identifier “Limited Use” and “Excluded” lists are restricted and cannot mix with any other characters besides emojis. Because of endless confusable possibilities, but also because they are obsolete, not in modern use, only used liturgically, or have unresolved architectural issues that make them unsuitable for identifiers.

So 𓀀𓀁𓀂.eth is valid, 𓀀𓀁𓀂🚀.eth is valid, but 𓀀𓀁𓀂a.eth or 𓀀𓀁𓀂あ.eth or any other mixture of that script with other scripts will be invalid.

Greek / Cyrillic also cannot mix with Latin and have whole-script confusable restrictions as well. This is all laid out earlier in this thread by Raffy, but I’ll let him clarify.

mattgarcia.eth · March 7, 2023, 3:15pm

Perfect answer, @serenae
that clarifies all my questions

Theth.eth · March 27, 2023, 8:38am

bit quiet here…

nick.eth · March 27, 2023, 9:29am

@adraffy Can you please submit your ENSIP and all dependencies as a PR against the docs repo?

raffy · March 30, 2023, 10:31pm

I will submit this weekend.

I added the following to the resolver demo: for each normalization, I show if eth-ens-namehash is valid, errors, or is different from my norm.

Valid:

image1088×158 26.2 KB
Different:

image1088×146 19 KB
Norm but Error:

image772×146 16.1 KB
Error but Norm:

image1024×166 19.9 KB

Theth.eth · April 3, 2023, 2:11am

raffy · April 3, 2023, 8:38am

I updated the interface slightly to make the normalization differences more clear:

Surprisingly, ‑888 (2011 38 38 38) was the only registered example that isn’t normalized in both algorithms, but normalizable, however divergent: 2011 → 2D vs 2011 -> 2010.

I created an initial PR for ensdomains/docs but it requires some input and likely some additional files before it’s ready.

raffy · May 15, 2023, 9:08am

I recently saw a tweet about a spoofed ethmoji purchase, involving the US ( 🇺🇸) vs UM flags ( 🇺🇲). I think this is something we’ll need to address in the Emoji 16.0 update sometime late next year along with how to handle the directional emojis (if that gets approved.)

Unfortunately, these emoji are RGI and available on Emoji keyboards. While US vs UM might seem easy to decide as American, some of other confusable flags aren’t so clear w/r/t picking a “winner”.

At the moment, the best we can do is client-side warnings or injecting some extra information into the metadata avatar. I think the ultimate solution would be petition Unicode to fix these visually indistinguishable emoji using additional colors, subscripts, or bordering. For example, subnation flags could have an island-icon subscript and gendered emoji could have a gender-symbol subscript.

From the ENSIP:

I had some inquires about some additional Mathematical symbols, like 2295 (⊕) CIRCLED PLUS. I was able to keep some unique Mathematical characters, like ∂, ∏, √, ∑, ∮, ∫, but in general, I errored on the side of caution. It’s possible these could be reviewed in a future update but I think the current spec is sufficient.