I’ve released a version of ens-normalize.js that correctly handles all emoji (and ZWJ) but this is only the first step towards the correct solution. One look at the confusables should make this obvious.
I’ve also explored some input UX ideas, created a low-level API ens_tokenize
, and have a basic HTML formatting library lib-parts.js
which makes “exploding” a name (w/r/t normalization) relatively simple. The Resolver Demo or Emoji Report are good examples.
I’ve made a few attempts at individually addressing homograph attacks but found the problem intractable for one person. The best solution at the moment appears to be script-based restrictions. The only reason I haven’t released a version using this technique is that it doesn’t solve the problem for the primary case: Common/Greek/Latin/Cyrillic scripts are just too similar – it doesn’t matter if you make Common unmixable with Cyrillic as you can trivially spoof between them. Prior art from DNS isn’t very helpful because it’s either too restrictive or simply converts all exotic names to punycode.
Currently, I am attempting to break these scripts into more manageable chunks, both to represent the data visually (so it’s easier to see whats going on) and in the hope that I can define restrictive recipes using these new pseudo-scripts, that still match most registered names, but prevent needing to solve the confusable issue between all scripts.
I am 100% open to ideas and suggestions.
If possible, keep the discussion in the original thread, ENS Name Normalization, so it’s easier to track.