I sort of like the idea of maintaining the whitelist of emoji characters so that it can also be used for decoding emojis for subdomains by indexing service like TheGraph (and the lack of the list is the reason why we see lots of a jumble of characters on subdomain list).
Can you remind me where are you suggesting to make it unresolvable?
āMaking it unresolvableā is a way of censorship and I think @nick.eth didnāt include the mechanism at smartcontract/protocol level by design.
Restricting the new registration or renewal at .eth level registrar is probably more feasible though.
Someone called me out who the hell I am but neither you nor I actually have authority to say which names/emojis should be in the whitelist. Probably this is where we may need some sort of DAO to make the collective decision.
Emojis utillizing ZWJ possibly have formatting rules to them so like @tom said regex could also be a solution. At least on registration.
Nobody can predict though what new emojis will be out in the future so I donāt actually know how fast the service could react on that either by a whitelist or regexpā¦
Zero-width joiners are not only used for emoji; theyāre used to compose characters in languages other than English, too. We cannot solve this by whitelisting without excluding a huge number of non-english speakers from using names that make sense to them.
Iām not sure what issue with the Graph youāre talking about?
I believe heās talking about a change to the resolution/normalisation rules as I suggested earlier.
One thing I forgot to bring up in my earlier post is that MyCrypto has a library that identifies domains with deceptive encoding by using Chromeās algorithm: GitHub - ensdomains/ens-validation
isnāt this at front end library level, rather than at smartcontract/protocol level?
From what he said here, I understood that relying on the thirdparty is the temporary band aid regardless of they make it unresolvable or just warn.
Like showing of emojis on subdomains. If we have a list, we can decode them on our manager and Graph can also decode so that every dapp doesnāt have to decode on their own. https://app.ens.domains/name/ethmojis.eth/subdomains
I think this argument heavily over-weigh any other point of this discussion and seeing the increasing flood of āfakeā ens-names has completely deflated the value I saw in ENS names at the start.
I own a zwj-emoji domain, but Iāll happily give up ownership to see this fixed.
Almost everywhere I see an ens-domain in use, itās clickable so what the domain actually is(characters) in relation to what it looks like actually does not matter at all.
Edit: Also, Iām not sure of this, but doesnāt Unicode categorize things into different categories and subsets that could help define which characters to support?
Yes. Thatās fine, though - the design of ENS has always been such that you can register invalid names, they just wonāt resolve due to the normalisation rules.
A whitelist wouldnāt help here, since names can have multiple characters in them, not just a single emoji.
Someone has registered a domain name with zero-width connector.It is no longer possible to register domain names with zero-width connectors on app.ens.domains. Did he/she register it on other ENS-registered websites?
Because of the visual confusability introduced by the joiner characters, IDNA2008 provides a special category for them called CONTEXTJ, and only permits CONTEXTJ characters in limited contexts: certain sequences of Arabic or Indic characters. However, applications that perform IDNA2008 lookup are not required to check for these contexts, so overall security is dependent on registries having correct implementations. Moreover, the IDNA2008 context restrictions do not catch most cases where distinct domain names have visually confusable appearances because of ZWJ and ZWNJ.
Some code points need to be allowed in exceptional circumstances but
should be excluded in all other cases; these rules are also described
in other documents. The most notable of these are the Join Control
characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH
NON-JOINER. Both of them have the derived property value CONTEXTJ.
A character with the derived property value CONTEXTJ or CONTEXTO
(CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate
rule has been established and the context of the character is
consistent with that rule. It is invalid to either register a string
containing these characters or even to look one up unless such a
contextual rule is found and satisfied. Please see Appendix A, āThe
Contextual Rules Registryā, for more information.
UTS-46 calls out an implied asymmetry between domain resolution and domain registration. It expects that domain registrars will enforce stricter rules than those imposed by UTS-46, and accepts that some valid normalizations will never resolve because the domain canāt be registered.
Because normalization is not required for ENS domain registration (in the absence on on-chain normalization, anyone can register non-normalized 3+ character names directly on ETHRegistrarController), resolution is the only avenue by which restrictions can be placed on registrations. If ENS ever wished to enforce CONTEXTJ-style exceptions for arabic/emojis/etc, these exceptions would need to be published and used in all client resolution libraries.
We did not end up making changes to our normalisation process. Weād need to be very careful with any changes that restrict previously valid names unless we can be 100% certain it will only catch deceptive ones.
raffy.eth (not on the forum yet, I think), has written this new UTS-46 implementation. Iām hoping heāll chime in here with some input.
This is a good point about the handling of the (2) zero-width characters, 200C and 200D. The other (2) deviations I believe should be allowed (and thus mapped) as the IDNA 2008 spec suggests: 00DF ā C39F and 03C2 ā CF82. Certainly leaving the zero-widths unchanged is bad, but dropping them without everyone knowing the situation is also bad.
CONTEXTJ seems to be described here: rfc5892 The only issue I see is that these rules are kinda messy to automate the codification but theyāre very simple to implement, eg: If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C(Joining_Type:T)*(Joining_Type:{R,D})) Then True;
For @adraffy/ens-normalize.js, I let the zero-widths pass in my first version (1.0.2) as I assumed ENS was using IDNA 2008 so all deviations were allowed. I changed my library to support CONTEXTJ (which disallows ZW without context) and pushed a new version (1.0.3) which is reflected at my demo page: ENS Resolver Iāve included a specific examples w/r/t CONTEXTJ.
Kinda related: could the ENS dapp optionally display the namehash and/or a byte/codepoint representation of the name being registered?
impersonators are always all over the place, for example police catch police impersonators all the time, but you can still buy police badges
no matter how restrictive are the rules, people will always find a way to bend them
I like to think about it like this ā ENS smart contract is āwholesaleā, its dealing in bulk large quantities and as such it must be censorship resistant, but then .eth name hits wallet UI, or exchange UI, or some app UI, this is āretailā level where approach can be more granular, so that its UIās problem to catch bad people and UIās reputation would suffer if its not providing robust solutions against impersonation
on the other hand it is beneficial to have fixed set of rules on smart contract level, and it is a very bad idea to keep changing them, eventually with time all UIs will learn the rules and develop robust strategies in dealing with problems
3.)parseSearchTerms in the official dapp uses UTF-16 character length instead of code-point count. Fortunately, this doesnāt result in any bugs because both valid() and rentPrice() enforce the 3 code-point minimum and UTF-16 length is always >= code-point length.
4A.) Personally, I think ZW should be ignored (removed) outside of CONTEXTJ to match the standard. This would require many registered emoji names to have their namehash (and NFT) changed. I have no idea what you would do regarding collisions.
In this situation, thereās nothing wrong with using the fully-qualified or minimally-qualified or even a mix of emoji ā as long as they normalize to the same value, theyāre the same.
norm("RAFFY.ETH") === norm("raffy.eth")
norm("š§āā.eth") === norm("š§ā.eth")
4B.) Another possibility would deriving a rule which lets ZW exist inside emoji context, which would leave all of the fully-qualified names untouched. However, youād need to apply the reverse transformation to minimally-qualified emojis (to make the fully-qualified during normalization) and again deal with with the collision issue.
Itās not clear to me how you got from .eth to registering this in the UI, though, without manually copying-and-pasting a labelhash. Can you elaborate?
I think this would be ideal, as itād avoid breaking existing perfectly reasonable emoji names, as well as preventing deceptive uses of ZWJ.
However, it shows up as ā.ethā for me, because the dapp memoizes previously attempted labels. decodedNameHash then knows that [ba967c160905ade030f84952644a963994eeaed3881a6b8a4e9c8cbe452ad7a2] corresponds to .
The token I have is 6e0abe02c46fd98fe8652e10cf2717b988cfdd12484cd2d150ccf7f34bbaf215 which is the labelhash of [ba967c160905ade030f84952644a963994eeaed3881a6b8a4e9c8cbe452ad7a2].
My library currently applies IDNA 2008 rules with CONTEXTJ but also retains any emoji from the recommended set and upgrades any combinations that were entered minimally-qualified. This effectively preserves existing namehashes by injecting missing ZWJ during normalization. ZWJ are ignored outside these contexts.
If CheckJoiners, the label must satisfy the ContextJ rules from Appendix A, in The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) [IDNA2008] , except that if EmojiVersionā 0, ZWJ characters are allowed if they are within Emoji ZWJ Sequences specified for Unicode Emoji Version=EmojiVersion.
Not sure if there was any additional internal Unicode discussions around this, but seems like it didnāt make it into a final draft.
How would it treat ā.ethā vs ā.ethā? Are these 2 distinct domains?