ENS Name Normalization

I don’t think that’s the case. We are going to have to wait until the final report.

Just to bring up the most recent topic, currency symbols(which raffy is mentioning above and seems undecisive about them are resolving fine in the live demo tool- Personally I think they would be tremendous to have).

What I do wonder is what would happen to those that registered names through ENS that later become “invalid” take the currency example for instance. Are users getting refunded ?

I am sure there are more cases with other symbols but this is something I don’t see being addressed and sounds like the fairest thing to do also.

3 Likes

I just had this idea what you could build with the currencies, which would actually be quite a functional improvement over the current system (1 wallet for everything) by utilizing subdomains.

domain.eth -> ₿.domain.eth, $.domain.eth, ¥.domain.eth, Ξ.domain.eth, €.domain.eth etc.

With the subdomains resolving to a contract or wallet that handles only the specific currency/ERC20.

You could have different wallets for different ERC20 stablecoins and other synthetic currencies that are tied to your main eth wallet. This would make it much easier to handle these different ERC20 tokens quite like a checking account does in real life. Also takes the fear out of sending big funds to a random hex (double/triple checking intensifies :stuck_out_tongue_winking_eye:), when a subdomain clearly defines where and what is supposed to be sent.

Also imagine if exchanges would use subdomains like these instead of giving out random hex addresses to the user to deposit funds. Would eliminate many issues with people sending the wrong ERC20 tokens to exchange addresses, now they can give out clearly defined subdomains that can communicate which assets are to be sent to what address.

The thought of this becoming a thing actually wants me to build something like that :smile:. Subdomains seem to be something that are not entirely explored by the community yet but they have so much potential in the future.

1 Like

Wow. This sounds beautiful and very functional. The big €$¥£ along ₿ and Ξ would be tremendous to have and be able to apply in something like this.

It could actually be a very hygienic practice if that trend caught up. Very accessible and renown symbols . Hopefully raffy and everyone can find a way to fit those in, that was original intention it seems.

2 Likes

Ξ is already possible, it just isn’t normalized.

Additionally, Ricmoo made EIP-634 regarding display names and similar ideas were mentioned earlier in this thread.

aрe.eth with Cyrillic p is a perfect example of a malicious confusable. It is valid at the moment. It would be invalid with single-script output confusables logic.

💲₿💲💲₿💲.eth is valid if is allowed. abcd°.eth is not. I agree ° would be a nice character to have (along with many others.) This probably needs to wait for a more thorough review of all characters by the community. (I could see it either being its own character, or mapped to o.)

ユクシー.eth and マーシャドー.eth are both valid.

You can check the potential fate of these names here. We should probably have a separate thread for these issues.


Okay, I’ll post the corresponding error report tonight for single-script output confusables and then we can make a decision.

5 Likes

Please don’t post individual support requests to this thread.

1 Like

Any thoughts on 2-in-1 characters (ꜳæꜵꜷꜹꜻꜽʤʣʥᴔꭁꭂʩǁʪɮʫʨꝷʦʧꜩɱᵯ) being confusable? eg. aa vs . aa shouldn’t be confusable because it’s double ASCII.


Combining Marks (CM) modify how a character is presented:

  • å = 61 30A (where 30A is a CM)
  • å = E5

NFC is responsible for collapsing these together, eg. they both normalize to E5. For some characters, there is no combined glyph, eg. e̊ = 65 30A has no corresponding single character form.

Multiple CM can be attached to the same character, eg. ã̰ = 61 303 330. NFC is responsible for putting the CM in a canonical order.

You can stack CM on characters, eg. ã̃̃̃̃̃̃̃̃̃ and a̰̰̰̰̰̰̰̰̰̰.

Some CM stack without any visual indication, eg. (1x) vs a̸̸̸̸̸̸̸̸̸̸ (10x).
a̸̸.eth ≠ a̸̸̸.eth ≠ a̸̸̸̸.eth = ...

There’s currently 500 registered names with CM. We could disallow some of the malicious ones (underscore-like or very small, etc.)? We could disallow stacking?

1 Like

They don’t seem confusible to me; they look visually distinct. What does the confusible mapping you’re using say?

Do we need to disallow either? I’d rather only disallow things that have a high probability of deceiving people.

They’re all confusable with their separate character equivalents. However, I agree they’re probably fine. They’re easy to distinguish when monospaced.

The av confusable is the only one with two versions: and . So those probably need to confuse, unless we want to pick one as the preferred, or map one to the other.

Nope. They’re all currently valid. I’m just unsure how someone can tell a̸̸.eth and a̸̸̸.eth apart. Almost need a warning for “this name contains combining marks”.

The recommendations here seem reasonable.

2 Likes

That took much longer than expected – I went down a rabbit hole with UTS-39 and UAX-24.

I implemented the Highly Restrictive version. It’s less strict than the explicit single script version I had implemented before, so things like 1a〆.eth (ASCII+Han) work but aрe.eth (Cyrillic p) do not.

Latest Error Report (737 single script errors, .json). I sorted the errors by the subtype. The vast majority look malicious to me. It is very easy to add scripts combinations to permit additional exclusions. The demo is also running this version.

I’ll provide another error report once the confusable part is working correctly.

Interestingly, the single-script logic makes some of the ContextO rules unnecessary (Greek Keraia, Hebrew Geresh, Hebrew Gershayum) are now all impossible because they require additional scripts to violate. The Arabic-Indic rule will be removed once confusables are active. This leaves Middle Dot (l·l) and Katakana.

3 Likes

:fire:

hello, would you be added russian alphabets as ens?

also, @nick.eth @raffy

will i get a refund for my ENS ape.eth since its invalid on OS and it got unlisted.
due, ape the cyrillic “p” mention as malicious/scam name, when i bought from ENS it did not mention anything regarding malicious/scam.

so please i would like to be refunded 707$ for the name i bought and has no use of it.
Or i wish this to be resolved and be listed on OS

thank you

Can you provide a justification for registering the name other than trying to trick people into thinking it’s ‘ape.eth’?

Buying ape.eth with cyrillic “p” was not for tricking people because i wanted to use that for myself as its my first short ens name it looked nice, i knew that the name will have caution :warning: as most people have same names but sometimes with caution sign or sometimes with replacing emoji’s or l,1

I know that purchasing the name will have my name with caution :warning:
But my ens being Delisted in OS and not showing in my wallet and being surprised by developers saying that my name ape.eth is a malicious/scam without informing me or anything mentioned about scam before i paid 707$ for 3 letters name which i like… its not good. My intentions was not to scam, i just wanted to have a nice 3 letters name thats all.

i wish next time if you inform and mention to the people who are registering a name with cyrillic word to know that this might happen.

AThats why now im shocked and i feel sad that i lost 707$ for nothing.
Please i wish my money to be refunded @nick.eth

Hello any update please

Please wait until these normalization changes are finalized. Once we have the final code and reports, we’ll know exactly which previously-valid names will have invalid metadata, and the ENS community here will I’m sure have a discussion about how to handle these names and under what circumstances refunds may be given, etc.

See these objectives laid out by Nick earlier in the thread: ENS Name Normalization - #20 by nick.eth

1 Like

Please keep conversation on this topic to discussion of the changes to the normalisation function. Offtopic replies such as requests for refunds or queries about the status of individual names will be deleted.

lots of normalization problems with arabic numerals it seems -

https://opensea.io/assets/ethereum/0x57f1887a8bf19b14fc0df6fd9b2acc9af147ea85/57878234801101706464907200301382902911825934948133920256067941549956179541975
https://opensea.io/assets/ethereum/0x57f1887a8bf19b14fc0df6fd9b2acc9af147ea85/85813848692050262483492322019064772227096889665505083244435923863898171704685

two visually identical domains, the normalization checker tool reports one as normalized the other one as not normalized

https://opensea.io/assets/ethereum/0x57f1887a8bf19b14fc0df6fd9b2acc9af147ea85/99942910256347406037411455050680494180435446940151882100227360960046000673397
https://opensea.io/assets/ethereum/0x57f1887a8bf19b14fc0df6fd9b2acc9af147ea85/41056161334239367631533052882077582946209860478205435780213207492284282125669

again two visually identical domains, the tool reports both are normalized

Hey! :wave:

Recently people have taken to registering Arabic-Indic digits, like ٠٠١.eth.

However, as you are probably already aware, there are separate “extended” versions of those digits for whatever reason.

Both show as valid right now in the live demo, but I’m assuming that’s just because you haven’t finalized/enabled the confusables code right?

(Incidentally, both sets of digits can be registered right now through the ENS manager app)

Is the plan for those characters to normalize the “extended” ones to the regular ones? Like ۰۰۱.eth would normalize to ٠٠١.eth?

Or will the extended digits just become invalid altogether?

Thanks!

2 Likes

it seems that indian numerals resolve to english numerals on opensea. is this a problem with opensea, or something ENS can change?

the domain in question - https://opensea.io/assets/ethereum/0x57f1887a8bf19b14fc0df6fd9b2acc9af147ea85/9500737025697232205813338291686174990382446139474886045490700210042399989925

there are of course many more examples