Confusables arenāt addressed by my normalization proposal. I encountered too many edge cases while trying to develop a complete confusable-free solution.
My suggestion is that we have hard errors for names (normalization) that use illegal constructions (disallowed characters, illegal emoji, invisible characters, etc.) and soft errors for names that are unsafe/confusable (validation).
This allows the normalization spec to standardize and āunsafeā names still work. We can expand the universe of safe names until nearly all reasonable names are covered. We can start with Alphanumeric ASCII + colored emoji which I claim are safe. If we follow the distribution of registered names, it should be easy to hit 99%+ coverage.
There are a large set of single characters (~2K) that consist of default text-presentation emoji (ā¶
, eg. those that appear uncolored) and non-emoji pictographs (āļø
). Thereās probably a set of these that are safe to use in any name like colored emoji. However, some of these arenāt unique and require a decision: ⤠[2764] vs ā„ [2665]
. On Mac, it appears that some of these already buck the Unicode convention and appear colored (eg. ā
) whereas ā [2196] vs āļø [2196 FE0F]
does not. Determining which of these are safe covers another 1% of names.