Zero-width characters pose a security risk and existential threat to ENS

I have a test harness, but all of the code needs ported to Javascript. I can give a quick summary of the results as-of the latest version (1.0.7). This will eventually be in one of the automated reports in my repo.

  • 6235 examples in IdnaTestV2.txt
  • 453 valid match
  • 6235 error match (needs checked)
  • 9 output differences [1]
  • 8 error for adraffy, valid for idna [2]
  • 684 valid for adraffy, error for idna [3]

[1]. 5 of these are due to an off-by one bug in my code ContextJ at the start of a string. This reduces the list to 4, which all involve FFFD. Here is an example, I’m not sure if it makes sense:

  • Input: [120795, 65294, 63992]
  • adraffy: [51, 46, 31520]
  • idna: [51, 46, 65533]

63992 (F9F8) is mapped to 31520 (7B20). I’m not sure what makes that disallowed and marked as FFFD. I’ll look at this tomorrow.

[2]. These are all due to adraffy disallowing FFFD. I think this essentially means they’re the same.

[3]. Half of these are differences on ContextJ handling. For adraffy, I strip ZWs out of context, not error, since they’re nearly impossible to edit for the end-user. The major issue remaining in this group are the BIDI rules, which I haven’t implemented yet.