I haven’t been able to establish an algorithm worth standardizing that incorporates all of the validation checks, that prevents all known spoofs, and has a reasonable on-chain implementation.
My plan is to revert to my January release which fixes the Emoji and ZWJ issue and establishes the mechanical process of preparing a name for hashing. I will update my ENSIP to match. This should be done soon.
I also think it’s wise to use IDNA 2003 + whatever characters we discussed (underscore, +missing keycaps, +missing emoji, currency symbols) and remove ContextO, CheckBidi, CheckHyphens, etc. If there are any other characters worth enabling, please let me know.
A separate library (that won’t be part of the ENSIP) will be provided to apply validation checks. However, I think on-chain solutions might better.
For an on-chain implementation, I think the first step is writing a function that asserts if a name was normalized correctly: string
→ bool
I would consider names that match /^[a-z0-9_-.]+$/
w/Emoji as the “safest” set of ENS names (ignoring skin color emoji differences.) 94% of registered names fit this criteria.
My thinking is emoji parsing is independent of text parsing. We just need a function that takes a string abcXzYdef
(where “abc” and “def” are text and XzY is an emoji ZWJ sequence) and produces a new string abcEEEdef
, where E acts like a generic emoji placeholder. Then a second pass is made using a text filter.
UTF8.decode(string) -> uint24[] // revert if invalid
Emoji.filter(uint24[]) -> uint24[] // revert if middle of sequence
BasicValidator.validate(uint24[]) // revert if invalid codepoint
I’ve written two versions of the emoji filter contract. One works as a library and uses a state machine, it’s about 160K gas to validate 💩raffy🏳🌈.eth
, the other is a contract that uses storage, and is about 60K gas.