On-chain ENS Domain Normalization

Here are some thoughts:

Assume max label length fits in 16 bits (or w/e). The low level normalize should be string → (string norm, string normNoEmoji, uint256 labelData) where:

  • normNoEmoji is the same string but each emoji sequence is zero’d
    eg. :poop: (4 bytes) → [0,0,0,0]
  • each labelData is:
    • 16 bit → label length
    • 240 bits → bitset of active non-emoji codepoints shifted by 14
      (2^21/240 => 14 bits)

From that, it’s easy to compute the namehash, extract out any label, compute any label hash, or quickly determine which validation could apply. eg. the basic validator (DNS + Emoji) only runs if the bitset is 0x1. normNoEmoji avoids processing emoji again during validation.

The primary normalize function should be string → (string, hash) like you describe.


I think all the charset stuff should exist in the validator, whether that’s the same contract or a different contract, I’m not sure. Most validator checks are per-label, so given (label, bitset), you can efficiently check if its valid. IIRC, only check bidi requires a full-name check.

The low level validation function would be (string, bitset) → bool like:

function validate(string label, uint256 bitset) returns (bool) {
     // any 0 byte is a previously processed emoji
     if (bitset == 1) { // only codepoints [0, 0x4000)
         // check if basic
         // check if latin non-confusable
         // check if greek w/o wholescript confusable
         // etc...
     }
     return false;
}

Or more cleanly, the validation contract could have arbitrary functions key’d by bitset + nonce, so you could register/unregister any number of validation functions, and quickly check which ones could apply by intersecting the label bitset, etc.


The primary normalize + validation function would be string → (string norm, uint256 hash, bool valid) where:

given name
(norm, normNoEmoji, labelData) = normalize(name)
start = 0
hash = 0
valid = true
// possibly apply full-name check
for [len, bitset] of labelData.reverse()
    hash = keccak(hash + keccak(norm.slice(start, len)))
    valid &= validate(normNoEmoji.slice(start, len), bitset)
    start += 1 + len
return (norm, hash, valid)
  • If this throws, the name is invalid
  • If valid is false, the user should get some kind of warning that the name is potentially unsafe (where unsafe means one or more labels satisfied 0 approved validators.)
  • If valid isn’t needed, use the primary normalize function instead.