Zero-width characters pose a security risk and existential threat to ENS

royalfork · November 23, 2021, 4:56am

Was anything more done on this front?

This is discussed in the Unicode specs as CONTEXTJ. From UTS #46: Unicode IDNA Compatibility Processing

Because of the visual confusability introduced by the joiner characters, IDNA2008 provides a special category for them called CONTEXTJ, and only permits CONTEXTJ characters in limited contexts: certain sequences of Arabic or Indic characters. However, applications that perform IDNA2008 lookup are not required to check for these contexts, so overall security is dependent on registries having correct implementations. Moreover, the IDNA2008 context restrictions do not catch most cases where distinct domain names have visually confusable appearances because of ZWJ and ZWNJ.

More specifically in RFC 5892 - The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)

Some code points need to be allowed in exceptional circumstances but
should be excluded in all other cases; these rules are also described
in other documents. The most notable of these are the Join Control
characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH
NON-JOINER. Both of them have the derived property value CONTEXTJ.
A character with the derived property value CONTEXTJ or CONTEXTO
(CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate
rule has been established and the context of the character is
consistent with that rule. It is invalid to either register a string
containing these characters or even to look one up unless such a
contextual rule is found and satisfied. Please see Appendix A, “The
Contextual Rules Registry”, for more information.

UTS-46 calls out an implied asymmetry between domain resolution and domain registration. It expects that domain registrars will enforce stricter rules than those imposed by UTS-46, and accepts that some valid normalizations will never resolve because the domain can’t be registered.

Because normalization is not required for ENS domain registration (in the absence on on-chain normalization, anyone can register non-normalized 3+ character names directly on ETHRegistrarController), resolution is the only avenue by which restrictions can be placed on registrations. If ENS ever wished to enforce CONTEXTJ-style exceptions for arabic/emojis/etc, these exceptions would need to be published and used in all client resolution libraries.