Non-compliant name on official lookup:🌈rainbow.eth


How did this name get registered? 🌈rainbow.eth does not pass nameprep.
🌈 isn’t IDNA compliant, yet this name shows up on the official lookup. Has this been overlooked?

1 Like

Thanks for the report! I’ve filed a bug for it here:

It looks like our manager is ignoring errors from nameprep.

What about 💩💩💩💩.eth shows this address as registered, but says the name is invalid (too short) …

Alex van de Sande snuck that one through the invalidation process. Nobody thought to check for it before he migrated it, and now it’s his for good.

1 Like

I just checked, and this name does pass uts-46 normalisation, which ENS uses. IDNA also uses uts-46, but it must be using a more restrictive set of permitted characters.

Nevertheless, we need to improve our handling of invalid or non-normal names.

UTS-46 normalisation is part of the IDNA protocol. So for UTS-46 to make any sense, ENS needs to adhere to the IDNA standard.
You can read more about UTS-46/IDNA here:
Since 2003 the set of valid codepoints has been defined as the union of CONTEXTJ, CONTEXTO, and PVALID under any version of Unicode from 5.2 on. Presently this means that emoji are definitely illegal, and any compliant registrar will not accept emojis.
The reasoning behind excluding emojis is described in this advisory from ICANN:

I’m aware that IDNA uses UTS-46. ENS has opted to just use UTS-46 normalisation, which reports the above domain as valid. If IDNA does additional processing over and above UTS-46, that’s not part of ENS.

What’s the point of using the UTS #46 mappings if we don’t care about IDNA? We might as well abandon UTS #46 altogether and only enforce case-folding.

I’m not sure I follow your argument. UTS 46 performs useful tasks besides case-folding, because it’s unicode-aware, and it also strips or prohibits problematic characters like ZWJ.

As far as I’m aware, IDNA is just UTS-46 plus punycode, but there could be other processing steps I’ve missed. Are there?

Yes, there’s a validation step. IDNA includes validation, normalization (nameprep), and punycode. UTS #46 specifies another (different) normalization step that takes place before all the original IDNA processing, and it’s meant to be used together to provide compatibility between 2008 and 2003. [link] Of course, no one is forcing us to apply the IDNA processing. But just to be clear, that would mean that some names (e.g. emojis) aren’t IDNA-compliant.

So I’d like to confirm: We don’t care if the names aren’t IDNs, right? If we don’t care then it doesn’t really matter what kind of normalization we use. (Emojis aren’t compliant in IDNs.)

Addendum: The current web3 Python library’s ens module does comply with IDNA [source code], so if we don’t want that we should file a bug report.

That’s right. We care that names are consistently normalised across different platforms, and that we do what we can to eliminate confusingly similar names - for example by removing non-printing characters. We don’t care if the rules are exactly the same as IDN.

The Python IDNA library says:

As described in RFC 5895, the IDNA specification no longer normalizes input from different potential ways a user may input a domain name. This functionality, known as a “mapping”, is now considered by the specification to be a local user-interface issue distinct from IDNA conversion functionality.

Which seems to contradict what you’re saying about IDNA processing. What other processing does IDNA 2008 do above and beyond UTS-46?

Understood, so you’re saying that we’re not trying to comply with IDNA. Just wanted to make that clear.

IDNA also does validation, further restricting the set of allowed characters. In particular, “Symbol, Other” (So) is calculated as DISALLOWED in the standard. Have you tried encoding say ‘:smiley:’ using that Python library? It’ll fail:

>>> import idna
>>> idna.encode('😃', strict=False, uts46=True, std3_rules=True, transitional=False)
Traceback (most recent call last):
  File "python3.6/site-packages/idna/", line 355, in encode
  File "python3.6/site-packages/idna/", line 276, in alabel
  File "python3.6/site-packages/idna/", line 253, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+1F603 at position 1 of '😃' not allowed

I’d completely missed that IDNA had prohibited characters that are allowed by UTS-46. I’m kind of baffled why the authors of UTS-46 didn’t just make the list of prohibited characters include all of those from IDNA.

In any case, we’re pretty much committed to UTS-46 normalisation only, as that’s what’s encoded in the ENS standard at this point. Any change would risk some fairly significant incompatibilities at this point.

The UTS-46 spec only specifies what’s needed to provide compatibility between 2003 and 2008. I think a lot of people have, like me, assumed the IDNA protocol. Even the web3 library package does, so maybe it’s also a possibility to follow IDNA anyway?

I don’t think that’s true:

The specification provides two main features. The first is a comprehensive mapping to support current user expectations for casing and other variants of domain names. Such a mapping is allowed by IDNA2008. The second feature is a compatibility mechanism that supports the existing domain names that were allowed under IDNA2003. This second feature is intended to improve client behavior during the transitional period.

You’re referring to the second part, but UTS-46 also does all the mapping (case folding, etc) for IDNA 2008.

It also says:

The UTS #46 specification defines a mapping consistent with the normative requirements of the IDNA2008 protocol, and which is as compatible as possible with IDNA2003.

As far as I can tell, the only thing IDNA does besides UTS-46 is implement a set of blacklisted unicode character types that are not permitted in names.

Yes, you’re right about that. So if we don’t care about the disallowed character set, we should make that clear in the EIP (and also let the web3 python maintainers know).

I’m personally glad that only UTS-46 was followed, not all of IDNA. IDNA is closer to a whitelist than a blacklist, there are whole swaths of character blocks that IDNA doesn’t allow without great reason IMO (like all of the emoji set). See for a comparison of UTS-46 and IDNA.

I guess whether it was a “great reason” is a matter of opinion, but you can read the original advisory from the SSAC here, from May of 2017: SSAC Advisory on the Use of Emoji in Domain Names

I believe that document is ICANN saying, “IDNA standard doesn’t allow emoji, so we won’t either” along with some reasons why they won’t break from IDNA in favor of emoji.

The author of that document is clearly not a teenage girl (humanity’s R&D system for language evolution) if they think these are all communicating the same thing and “indistinguishable” from each other:

I’m also not convinced that input is an issue. The only place that I cannot trivially input Emoji these days is my browser’s address bar on my desktop, which is likely just because IDNA makes it unnecessary. If it was necessary for domain names, I see no reason why browser address bars wouldn’t start supporting it.

They also claim, “different systems represent the same emoji with different code points”, which is like saying, "there are fonts out there that render an r with an s glyph, so we shouldn’t support r or s. Really we should be saying, “don’t use band fonts”.

I’m mildly convinced that zero width joiners are not ideal, but I’m open to be convinced otherwise. I don’t know enough about ZWJs to really comment meaningfully on it.