As I said above, it’s a per-application UX issue.
I will include a beautifier function in the next release. Additionally, since there has been progress on the on-chain implementation, I will also add an on-chain beautifier function.
As I said above, it’s a per-application UX issue.
I will include a beautifier function in the next release. Additionally, since there has been progress on the on-chain implementation, I will also add an on-chain beautifier function.
Can’t wait for it, beautiful ! Highly appreciated by the Ethmoji99 and Ethmoji999 community.
FYI in the current deployed resolver tool, the “braille blank pattern” character shows up as valid: ENS Resolver
Good catch. I’ll scan through the valid set for additional invisibles.
I greatly simplified my ENSIP proposal. I think this is much closer to what we should standardize. It includes:
I need to merge in the Arabic digit mapping we decided on above, update the tests to include the handwritten ones (from the prior IDNA 2008 approach), and then double-check that everything agrees.
I’ll then port this logic to my ens-normalize.js repo and release a compressed implementation (as those 2 data files are 1.2MB combined) and spinoff the remaining validation logic into a separate project.
We can follow this up with a matching on-chain implementation and then figure out what to do about validation: applying the more complex rules like single-script confusables, whole-script confusables, stupidly placed combining marks, check bidi, etc.
Edit: For Unicode 15, it looks like 20 emoji and 1 ZWJ sequence:
1F6DC,1FA75,1FA76,1FA77,1FA87,1FA88,1FAAD,1FAAE,1FAAF,1FABB,1FABC,1FABD,1FABF,1FACE,1FACF,1FADA,1FADB,1FAE8,1FAF7,1FAF8
1F426 200D 2B1B
→ Black Bird
I just had my first real-life run in with this issue. A user confused why their name had multiple owners. The issue was that the name contained (from what I understand) a persian 9 along with arabic digits.
Going to keep happening, I have talked about it in the past on Twitter and warned people
Just for a single NNN name in Arabic/Persian there are 8 combinations
For a single NNNN name in Arabic/Persian there are 16 combinations
Non-ASCII characters on ENS are a mess and have been rushed out and approved without fully thinking it out
I think the mapping we discussed above is a fair solution.
It would be valuable to know if there are any other confusables that fit this pattern: where the end-user is frequently unaware of the difference due to script overlap (making a mapping the better solution.)
Note: "p" vs "р" [Latin 70 vs Cyrillic 440]
doesn’t fit this pattern because those p
's can be differentiated when surrounded by characters of the same script (that aren’t equally confusing.)
Got another confusable here:
Extended dash’s trying to be hyphens
—
Yeah, that’s a good one, both em and en should be mapped to "-"
. I’ll check if there are any more Common dash-like characters.
2013 2014 2212
Edit: minus
Does the mapping mean that if someone sent something to a name using a em/en by mistake, it would end up in the hyphen wallet
or would the transfer just not go through?
Mapping means all those hyphen-confusables get replaced with -
.
In your example, sending something to “a—b.eth” would go to “a-b.eth”. But the larger point would be, “a—b.eth” isn’t valid.
Disallowing those characters is fine too, but hyphens are similar to the Arabic numerals in that they’re the same script, frequently used, but hard to visually distinguish.
Ultimately, I see two separate steps: normalization and validation.
Normalization makes it so everyone, given input, hashes the correct name. During this process, each character is either: an emoji sequence, valid, mapped to something else, ignored, or disallowed. Changes to this logic can impact previously registered names.
Validation checks if the name meets a bunch of criteria, like it’s an accepted script combination, doesn’t have whole script confusables, obeys bidirectional rules, doesn’t use characters in wrong contexts, etc. Validation can only reject or accept a name, it can’t change the hash.
If we can standardize Normalization (and have an on-chain implementation), then input names will always resolve to the expected result, which eliminates the problems with emoji, invisibles (ZWJ, ZWNJ, Braille Blank, etc.), and hard-confusables (Arabic numbers, hyphens, etc.).
Cheers for getting back to me
Would _ also be mapped to - ??
Underscore is disabled in IDNA 2003 but I was planning to enable it (along with $
). I think mapping to hyphen is reasonable as well.
Are a-b
and a_b
too similar?
Definitely not.
Same physical key on a keyboard
I feel there will be many mistakes
I am biased but I feel it would be wrong to not map it to the hyphen name
I own several hyphen names and several hyphen pre-punk names
Think about companies like coca-cola, who own and use coca-cola.eth, then y-3.eth, g-starRAW.eth etc etc
All these companies will lose confidence in ENS if you then allow coca_cola.eth to be able to be registered
They would all need to fight again to try and secure their names at cost
In my view, do not annoy these companies as they may turn their back against ENS
It would also mean all those pre-punk names with hyphens would be able to be copied
Actually not just pre-punk names but all names with a hyphen in them, and there are plenty
I know Nick doesn’t like hyphen names, but don’t shot ENS in the foot, it’s already got it’s problems, don’t make another one
ENS could be huge, it is already becoming a beast and I don’t think it’s even started yet, but again with mismanagement it could also fail
I have been openly critical of Nick and his ideas of how to issue 2 & 1 character names, I still think the tax method he was suggesting is stupid, but that is just my opinion, if you don’t listen to opinions then it just becomes an echo-chamber and mistakes are made
regarding the $ sign
Does ENS really need it??
I’m not really seeing why it is needed
Remember that for mass adoption of ENS it needs to appeal to the masses. The masses don’t spend all day at a computer keyboard, simplicity will attract and keep the masses, make it too technical or have too many issues and it will scare away the masses or make them hesitant in adopting it
thank you raffy your work is appreciated!
Agree
Is this the outcome discussed and approved? I read through the most recent comments, but not exactly sure what the outcome was. Will the extended arabic (persian) digits, route to the regular arabic indic digits for the ones where there is overlap?
I believe they need to be mapped, if we want a good UX.
According to UTS-46 w/ Context O the recommendation for Arabic Numerals is never allow digit mixing. However, that still permits visually identical names for corresponding digits.
According to UAX-15 and visual inspection, 0-3,7-9
are confusable, so either you disallow those characters or pick a preferred one. The recommended solution is to convert to punycode, so the user sees a gibberish name, but now have the information necessary differentiate the confusables characters (if they know the correct punycode form.) For ENS, we don’t have an alternative input form.
Discussed? yes. Approved? no, that’s the purpose of this discussion!
Hyphen and underscore are distinct characters with distinct visual appearances. Both are already used in DNS, too, where they have different meanings. We definitely shouldn’t map them to the same character.