ENS Name Normalization

Thanks @Theth.eth. Agreed, a bunch of that punctuation block should be disallowed.


General Punctuation

Disallowed

Summary
  1. 2000 ( ) EN QUAD
  2. 2001 ( ) EM QUAD
  3. 2002 ( ) EN SPACE
  4. 2003 ( ) EM SPACE
  5. 2004 ( ) THREE-PER-EM SPACE
  6. 2005 ( ) FOUR-PER-EM SPACE
  7. 2006 ( ) SIX-PER-EM SPACE
  8. 2007 ( ) FIGURE SPACE
  9. 2008 ( ) PUNCTUATION SPACE
  10. 2009 ( ) THIN SPACE
  11. 200A ( ) HAIR SPACE
  12. 200C (‌) ZERO WIDTH NON-JOINER
  13. 200D (‍) ZERO WIDTH JOINER
  14. 200E (‎) LEFT-TO-RIGHT MARK
  15. 200F (‏) RIGHT-TO-LEFT MARK
  16. 2017 (‗) DOUBLE LOW LINE
  17. 2024 (․) ONE DOT LEADER
  18. 2025 (‥) TWO DOT LEADER
  19. 2026 (…) HORIZONTAL ELLIPSIS
  20. 2028 ( ) LINE SEPARATOR
  21. 2029 ( ) PARAGRAPH SEPARATOR
  22. 202A (<U+202A>) LEFT-TO-RIGHT EMBEDDING
  23. 202B (<U+202B>) RIGHT-TO-LEFT EMBEDDING
  24. 202C (<U+202C>) POP DIRECTIONAL FORMATTING
  25. 202D (<U+202D>) LEFT-TO-RIGHT OVERRIDE
  26. 202E (?) RIGHT-TO-LEFT OVERRIDE
  27. 202F ( ) NARROW NO-BREAK SPACE
  28. 203C (‼) DOUBLE EXCLAMATION MARK
  29. 203E (‾) OVERLINE
  30. 2047 (⁇) DOUBLE QUESTION MARK
  31. 2048 (⁈) QUESTION EXCLAMATION MARK
  32. 2049 (⁉) EXCLAMATION QUESTION MARK
  33. 205F ( ) MEDIUM MATHEMATICAL SPACE
  34. 2061 (⁡) FUNCTION APPLICATION
  35. 2062 (⁢) INVISIBLE TIMES
  36. 2063 (⁣) INVISIBLE SEPARATOR
  37. 2065 (⁥) undefined
  38. 2066 (<U+2066>) LEFT-TO-RIGHT ISOLATE
  39. 2067 (<U+2067>) RIGHT-TO-LEFT ISOLATE
  40. 2068 (<U+2068>) FIRST STRONG ISOLATE
  41. 2069 (<U+2069>) POP DIRECTIONAL ISOLATE
  42. 206A () INHIBIT SYMMETRIC SWAPPING
  43. 206B () ACTIVATE SYMMETRIC SWAPPING
  44. 206C () INHIBIT ARABIC FORM SHAPING
  45. 206D () ACTIVATE ARABIC FORM SHAPING
  46. 206E () NATIONAL DIGIT SHAPES
  47. 206F () NOMINAL DIGIT SHAPES

Ignored

  1. 200B (​) ZERO WIDTH SPACE
  2. 2060 (⁠) WORD JOINER
  3. 2064 (⁤) INVISIBLE PLUS

Mapped

  1. 2010 (‐) HYPHEN2D (-) HYPHEN-MINUS
  2. 2011 (‑) NON-BREAKING HYPHEN2D (-) HYPHEN-MINUS
  3. 2012 (‒) FIGURE DASH2D (-) HYPHEN-MINUS
  4. 2013 (–) EN DASH2D (-) HYPHEN-MINUS
  5. 2014 (—) EM DASH2D (-) HYPHEN-MINUS
  6. 2015 (―) HORIZONTAL BAR2D (-) HYPHEN-MINUS
  7. 2033 (″) DOUBLE PRIME[2032 2032]
  8. 2034 (‴) TRIPLE PRIME[2032 2032 2032]
  9. 2036 (‶) REVERSED DOUBLE PRIME[2035 2035]
  10. 2037 (‷) REVERSED TRIPLE PRIME[2035 2035 2035]
  11. 2057 (⁗) QUADRUPLE PRIME[2032 2032 2032 2032]

Valid

  1. 2016 (‖) DOUBLE VERTICAL LINE
  2. 2018 (‘) LEFT SINGLE QUOTATION MARK
  3. 2019 (’) RIGHT SINGLE QUOTATION MARK
  4. 201A (‚) SINGLE LOW-9 QUOTATION MARK
  5. 201B (‛) SINGLE HIGH-REVERSED-9 QUOTATION MARK
  6. 201C (“) LEFT DOUBLE QUOTATION MARK
  7. 201D (”) RIGHT DOUBLE QUOTATION MARK
  8. 201E („) DOUBLE LOW-9 QUOTATION MARK
  9. 201F (‟) DOUBLE HIGH-REVERSED-9 QUOTATION MARK
  10. 2020 (†) DAGGER
  11. 2021 (‡) DOUBLE DAGGER
  12. 2022 (•) BULLET
  13. 2023 (‣) TRIANGULAR BULLET
  14. 2027 (‧) HYPHENATION POINT
  15. 2030 (‰) PER MILLE SIGN
  16. 2031 (‱) PER TEN THOUSAND SIGN
  17. 2032 (′) PRIME
  18. 2035 (‵) REVERSED PRIME
  19. 2038 (‸) CARET
  20. 2039 (‹) SINGLE LEFT-POINTING ANGLE QUOTATION MARK
  21. 203A (›) SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
  22. 203B (※) REFERENCE MARK
  23. 203D (‽) INTERROBANG
  24. 203F (‿) UNDERTIE
  25. 2040 (⁀) CHARACTER TIE
  26. 2041 (⁁) CARET INSERTION POINT
  27. 2042 (⁂) ASTERISM
  28. 2043 (⁃) HYPHEN BULLET2D (-) HYPHEN-MINUS
  29. 2044 (⁄) FRACTION SLASH
  30. 2045 (⁅) LEFT SQUARE BRACKET WITH QUILL
  31. 2046 (⁆) RIGHT SQUARE BRACKET WITH QUILL
  32. 204A (⁊) TIRONIAN SIGN ET
  33. 204B (⁋) REVERSED PILCROW SIGN
  34. 204C (⁌) BLACK LEFTWARDS BULLET
  35. 204D (⁍) BLACK RIGHTWARDS BULLET
  36. 204E (⁎) LOW ASTERISK
  37. 204F (⁏) REVERSED SEMICOLON
  38. 2050 (⁐) CLOSE UP
  39. 2051 (⁑) TWO ASTERISKS ALIGNED VERTICALLY
  40. 2052 (⁒) COMMERCIAL MINUS SIGN
  41. 2053 (⁓) SWUNG DASH
  42. 2054 (⁔) INVERTED UNDERTIE
  43. 2055 (⁕) FLOWER PUNCTUATION MARK
  44. 2056 (⁖) THREE DOT PUNCTUATION
  45. 2058 (⁘) FOUR DOT PUNCTUATION
  46. 2059 (⁙) FIVE DOT PUNCTUATION
  47. 205A (⁚) TWO DOT PUNCTUATION
  48. 205B (⁛) FOUR DOT MARK
  49. 205C (⁜) DOTTED CROSS
  50. 205D (⁝) TRICOLON
  51. 205E (⁞) VERTICAL FOUR DOTS

  • For mapped, I think we should disallow what I bolded.
  • For valid, I’m not sure. ⁂⁜※ seem cool, I’m not sure about , disallow the rest?
  • Edit: for valid, I think we should disallow what I bolded (and map the hyphen-bullet.)

Frequencies: JSON

Edit: If we disallow and we lose a lot of faces like ◕‿◕.

6 Likes