isnāt this at front end library level, rather than at smartcontract/protocol level?
From what he said here, I understood that relying on the thirdparty is the temporary band aid regardless of they make it unresolvable or just warn.
Like showing of emojis on subdomains. If we have a list, we can decode them on our manager and Graph can also decode so that every dapp doesnāt have to decode on their own. https://app.ens.domains/name/ethmojis.eth/subdomains
I think this argument heavily over-weigh any other point of this discussion and seeing the increasing flood of āfakeā ens-names has completely deflated the value I saw in ENS names at the start.
I own a zwj-emoji domain, but Iāll happily give up ownership to see this fixed.
Almost everywhere I see an ens-domain in use, itās clickable so what the domain actually is(characters) in relation to what it looks like actually does not matter at all.
Edit: Also, Iām not sure of this, but doesnāt Unicode categorize things into different categories and subsets that could help define which characters to support?
Yes. Thatās fine, though - the design of ENS has always been such that you can register invalid names, they just wonāt resolve due to the normalisation rules.
A whitelist wouldnāt help here, since names can have multiple characters in them, not just a single emoji.
Someone has registered a domain name with zero-width connector.It is no longer possible to register domain names with zero-width connectors on app.ens.domains. Did he/she register it on other ENS-registered websites?
Because of the visual confusability introduced by the joiner characters, IDNA2008 provides a special category for them called CONTEXTJ, and only permits CONTEXTJ characters in limited contexts: certain sequences of Arabic or Indic characters. However, applications that perform IDNA2008 lookup are not required to check for these contexts, so overall security is dependent on registries having correct implementations. Moreover, the IDNA2008 context restrictions do not catch most cases where distinct domain names have visually confusable appearances because of ZWJ and ZWNJ.
Some code points need to be allowed in exceptional circumstances but
should be excluded in all other cases; these rules are also described
in other documents. The most notable of these are the Join Control
characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH
NON-JOINER. Both of them have the derived property value CONTEXTJ.
A character with the derived property value CONTEXTJ or CONTEXTO
(CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate
rule has been established and the context of the character is
consistent with that rule. It is invalid to either register a string
containing these characters or even to look one up unless such a
contextual rule is found and satisfied. Please see Appendix A, āThe
Contextual Rules Registryā, for more information.
UTS-46 calls out an implied asymmetry between domain resolution and domain registration. It expects that domain registrars will enforce stricter rules than those imposed by UTS-46, and accepts that some valid normalizations will never resolve because the domain canāt be registered.
Because normalization is not required for ENS domain registration (in the absence on on-chain normalization, anyone can register non-normalized 3+ character names directly on ETHRegistrarController), resolution is the only avenue by which restrictions can be placed on registrations. If ENS ever wished to enforce CONTEXTJ-style exceptions for arabic/emojis/etc, these exceptions would need to be published and used in all client resolution libraries.
We did not end up making changes to our normalisation process. Weād need to be very careful with any changes that restrict previously valid names unless we can be 100% certain it will only catch deceptive ones.
raffy.eth (not on the forum yet, I think), has written this new UTS-46 implementation. Iām hoping heāll chime in here with some input.
This is a good point about the handling of the (2) zero-width characters, 200C and 200D. The other (2) deviations I believe should be allowed (and thus mapped) as the IDNA 2008 spec suggests: 00DF ā C39F and 03C2 ā CF82. Certainly leaving the zero-widths unchanged is bad, but dropping them without everyone knowing the situation is also bad.
CONTEXTJ seems to be described here: rfc5892 The only issue I see is that these rules are kinda messy to automate the codification but theyāre very simple to implement, eg: If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C(Joining_Type:T)*(Joining_Type:{R,D})) Then True;
For @adraffy/ens-normalize.js, I let the zero-widths pass in my first version (1.0.2) as I assumed ENS was using IDNA 2008 so all deviations were allowed. I changed my library to support CONTEXTJ (which disallows ZW without context) and pushed a new version (1.0.3) which is reflected at my demo page: ENS Resolver Iāve included a specific examples w/r/t CONTEXTJ.
Kinda related: could the ENS dapp optionally display the namehash and/or a byte/codepoint representation of the name being registered?
impersonators are always all over the place, for example police catch police impersonators all the time, but you can still buy police badges
no matter how restrictive are the rules, people will always find a way to bend them
I like to think about it like this ā ENS smart contract is āwholesaleā, its dealing in bulk large quantities and as such it must be censorship resistant, but then .eth name hits wallet UI, or exchange UI, or some app UI, this is āretailā level where approach can be more granular, so that its UIās problem to catch bad people and UIās reputation would suffer if its not providing robust solutions against impersonation
on the other hand it is beneficial to have fixed set of rules on smart contract level, and it is a very bad idea to keep changing them, eventually with time all UIs will learn the rules and develop robust strategies in dealing with problems
3.)parseSearchTerms in the official dapp uses UTF-16 character length instead of code-point count. Fortunately, this doesnāt result in any bugs because both valid() and rentPrice() enforce the 3 code-point minimum and UTF-16 length is always >= code-point length.
4A.) Personally, I think ZW should be ignored (removed) outside of CONTEXTJ to match the standard. This would require many registered emoji names to have their namehash (and NFT) changed. I have no idea what you would do regarding collisions.
In this situation, thereās nothing wrong with using the fully-qualified or minimally-qualified or even a mix of emoji ā as long as they normalize to the same value, theyāre the same.
norm("RAFFY.ETH") === norm("raffy.eth")
norm("š§āā.eth") === norm("š§ā.eth")
4B.) Another possibility would deriving a rule which lets ZW exist inside emoji context, which would leave all of the fully-qualified names untouched. However, youād need to apply the reverse transformation to minimally-qualified emojis (to make the fully-qualified during normalization) and again deal with with the collision issue.
Itās not clear to me how you got from .eth to registering this in the UI, though, without manually copying-and-pasting a labelhash. Can you elaborate?
I think this would be ideal, as itād avoid breaking existing perfectly reasonable emoji names, as well as preventing deceptive uses of ZWJ.
However, it shows up as ā.ethā for me, because the dapp memoizes previously attempted labels. decodedNameHash then knows that [ba967c160905ade030f84952644a963994eeaed3881a6b8a4e9c8cbe452ad7a2] corresponds to .
The token I have is 6e0abe02c46fd98fe8652e10cf2717b988cfdd12484cd2d150ccf7f34bbaf215 which is the labelhash of [ba967c160905ade030f84952644a963994eeaed3881a6b8a4e9c8cbe452ad7a2].
My library currently applies IDNA 2008 rules with CONTEXTJ but also retains any emoji from the recommended set and upgrades any combinations that were entered minimally-qualified. This effectively preserves existing namehashes by injecting missing ZWJ during normalization. ZWJ are ignored outside these contexts.
If CheckJoiners, the label must satisfy the ContextJ rules from Appendix A, in The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) [IDNA2008] , except that if EmojiVersionā 0, ZWJ characters are allowed if they are within Emoji ZWJ Sequences specified for Unicode Emoji Version=EmojiVersion.
Not sure if there was any additional internal Unicode discussions around this, but seems like it didnāt make it into a final draft.
How would it treat ā.ethā vs ā.ethā? Are these 2 distinct domains?
This is a very nice piece of work, and I think it could be the foundation for a better way of normalising names for ENS. Thereās a couple of things weād need to make that so:
Clear, explicit documentation describing the normalisation process, such that anyone else can implement it from scratch; itās not viable for people to rely on a single JS library everywhere. Preferably, pseudocode that starts from the primitive of a compliant UTS-46 implementation.
Tests over all existing ENS names to see which namesā resolution will be affected and how.
If youāre prepared to handle #1, I can take care of #2.
I released an update that has an optional boolean which ignores (rather than throws) on disallowed characters. I also added another layer of compression and got the minified file down to ~25KB (17KB gzip).
Iāve added a few comments and citations regarding the algorithm and sequence of operations.
I also included the start of a bunch of tests:
known.js tracks things that Iām specifically aware of and test-known.js makes sure they match.
goofy-labels.txt is a complete list of non-trivial ENS registered (thanks to @nick.eth) and check-goofy.js generates goofy.html for normalizations that donāt match.
opensea.js pulls known (name, token, owner)'s and can generate opensea-label-hash.json, from which check-opensea.js generates opensea.html for label-hashes that donāt match.
compare-ethers.js compares ens_normalize() to ethers nameprep() using known.js and generates compare-ethers.html
Before we can deploy this weāll need documentation thatās comprehensive enough someone can recreate the algorithm from scratch independently, and test vectors they can use to check their implementation. Iām happy to help with that.
It seems clear that a lot of these names were not normalised - and hence not resolvable - in the first place. Would it be possible to filter the list for names that are normalised according to the current Ethers implementation, and then only show those that have a different normalisation under yours?