On-chain ENS Domain Normalization

raffy · July 30, 2022, 1:01am

Here are some thoughts:

Assume max label length fits in 16 bits (or w/e). The low level normalize should be string → (string norm, string normNoEmoji, uint256 labelData) where:

normNoEmoji is the same string but each emoji sequence is zero’d
eg. (4 bytes) → [0,0,0,0]
each labelData is:
- 16 bit → label length
- 240 bits → bitset of active non-emoji codepoints shifted by 14
  (2^21/240 => 14 bits)

From that, it’s easy to compute the namehash, extract out any label, compute any label hash, or quickly determine which validation could apply. eg. the basic validator (DNS + Emoji) only runs if the bitset is 0x1. normNoEmoji avoids processing emoji again during validation.

The primary normalize function should be string → (string, hash) like you describe.

I think all the charset stuff should exist in the validator, whether that’s the same contract or a different contract, I’m not sure. Most validator checks are per-label, so given (label, bitset), you can efficiently check if its valid. IIRC, only check bidi requires a full-name check.

The low level validation function would be (string, bitset) → bool like:

function validate(string label, uint256 bitset) returns (bool) {
     // any 0 byte is a previously processed emoji
     if (bitset == 1) { // only codepoints [0, 0x4000)
         // check if basic
         // check if latin non-confusable
         // check if greek w/o wholescript confusable
         // etc...
     }
     return false;
}

Or more cleanly, the validation contract could have arbitrary functions key’d by bitset + nonce, so you could register/unregister any number of validation functions, and quickly check which ones could apply by intersecting the label bitset, etc.

The primary normalize + validation function would be string → (string norm, uint256 hash, bool valid) where:

given name
(norm, normNoEmoji, labelData) = normalize(name)
start = 0
hash = 0
valid = true
// possibly apply full-name check
for [len, bitset] of labelData.reverse()
    hash = keccak(hash + keccak(norm.slice(start, len)))
    valid &= validate(normNoEmoji.slice(start, len), bitset)
    start += 1 + len
return (norm, hash, valid)

If this throws, the name is invalid
If valid is false, the user should get some kind of warning that the name is potentially unsafe (where unsafe means one or more labels satisfied 0 approved validators.)
If valid isn’t needed, use the primary normalize function instead.