jdev - 2020-04-13

  1. lovetox

    so for domainnames normally i use stringprepare

  2. lovetox


  3. lovetox

    but in python there is module available that does the idna2008 standard

  4. jonas’

    I don’t like where this is going

  5. lovetox

    can i just switch to that, or is this going to be problematic?

  6. Zash

    Me neither

  7. jonas’

    no, those are different things

  8. lovetox


  9. jonas’

    you need to do nameprep first, then you can IDNA-encode the name before handing it to the DNS server

  10. jonas’

    IDNA is an encoding (unicode -> DNS-compatible ascii bytes)

  11. jonas’

    like UTF-8 is an encoding

  12. jonas’

    problem is: there are two incompatible versions of idna, and nobody knows which one to use

  13. lovetox

    im talking about validating domainnames

  14. jonas’


  15. lovetox

    in python there is a idna standard module

  16. jonas’

    that’s all validation you need

  17. lovetox

    it offers a method thats called nameprep

  18. jonas’

    there is no idna module in the python standard library

  19. jonas’

    according to https://docs.python.org/3/library/

  20. lovetox

    its a submodule of stringprepare

  21. jonas’

    there is no stringprepare module either

  22. lovetox

    or stringprep dont know

  23. jonas’

    stringprep doesn’t have submodules AFAIK. most certainly not an `idna` submodule

  24. Zash

    It hasn't been long enough since some user put http://example.com:5280/ as their XMPP server name and everything was fine with that.

  25. jonas’

    I’m wondering what you’re talikng about. LTIC python only supported IDNA2003

  26. lovetox

    ok https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/encodings/idna.py

  27. lovetox

    its a encoding

  28. jonas’

    as I said, yes

  29. jonas’

    and it’s IDNA2003

  30. jonas’

    not 2008

  31. jonas’

    again, IDNA* are encodings like UTF-8 are encodings. has nothing to do with validation or normalisation like nameprep does

  32. lovetox

    yes and it has a method call nameprep

  33. jonas’


  34. jonas’

    but it’s not public API

  35. jonas’

    seems as if nameprep is a precondition for IDNA2003 and they’re doing that for you

  36. lovetox

    oh ok so thats what confused me

  37. jonas’

    you can do a call to codecs.encode(some_domain, "idna") and if it doesn’t raise UnicodeEncodeError, then you know that it passes nameprep

  38. lovetox

    soo can is witch to nameprep -> IDNA2008

  39. lovetox


  40. jonas’


  41. jonas’

    it’s idna2003

  42. jonas’

    not idna2008

  43. jonas’

    also note that nameprep has been deprecated in XMPP

  44. lovetox

    you just said nameprep has nothing to do with idna, its just a precondition

  45. lovetox

    so what is it now

  46. jonas’

    TIL that nameprep is a precondition to IDNA2003.

  47. jonas’

    it doesn’t matter tho

  48. jonas’

    IDNA2003 does more than just nameprep

  49. jonas’

    if you want nameprep, you should do nameprep and not IDNA2003

  50. Zash

    You actually want IDNA too, nameprep isn't enough to validate an XMPP hostname.

  51. Zash

    domain. hostpart. thing.

  52. jonas’

    Zash, IDNA doesn’t buy you much though: >>> codecs.encode("http://foo:5082", "idna") b'http://foo:5082'

  53. Zash

    See previously mentioned incident with someone having a HTTP URL in their config

  54. Zash


  55. jonas’


  56. jonas’

    it’s just nameprep plus some mapping of characters

  57. jonas’

    plus length restrictions

  58. Zash

    > print(util.encodings.idna.to_ascii("http://foo:123/")) nil

  59. jonas’

    is that IDNA2003 or IDNA2008?

  60. Zash

    Probably 2008

  61. jonas’

    might be the difference

  62. jonas’

    python3 only has 2003

  63. Zash


  64. jonas’


  65. Zash

    but don't python also have ... whatsitcalled, precis?

  66. jonas’

    the issue is just open for seven years now: https://bugs.python.org/issue17305

  67. jonas’

    Zash, not built-in

  68. jonas’

    there’s a third party module

  69. Zash

    better than nothing

  70. jonas’


  71. jonas’

    there also seems a third-party IDNA module which does things

  72. jonas’

    and then again: I’m not sure XMPP software should make assumptions about what DNS allows.

  73. lovetox

    : is not allowed in IDNA2008

  74. jonas’

    let DNS deal with the weird things we put in the domainpart.

  75. jonas’

    it’ll tell us to f* off. encoding (too many) assumptions about how DNS works and what it allows seems like it can only lead to a world of pain

  76. jonas’

    normalisation makes sense for comparision and stuff, but beyond that…

  77. Zash

    `idn -a <<< "http://foo.bar:123/"` spits out its input

  78. Zash

    and that would be idna2003

  79. lovetox

    so i still dont see no problem to run a host through idna 2008?

  80. jonas’

    probably ok

  81. jonas’

    not sure what that gains you tho

  82. lovetox

    and if it returns an exception, tell the user its not a valid domain name

  83. jonas’

    I hate that type of stuff

  84. jonas’

    that’s the type of stuff which breaks new things

  85. jonas’

    you’ll notice that it’s not a valid domain name when you ask the DNS about the name

  86. jonas’

    (though, you need to do IDNA2008 or IDNA2003 before you ask the DNS)

  87. jonas’

    (but you can’t know which one is right \o/)

  88. Zash

    The least painful answer for us: IDNA2008 in IDNA2003 compat mode

  89. Zash

    IDNA2003 library support status: deprecated.

  90. lovetox

    so what does the xmpp standard say about domainpart

  91. Zash

    libidn v1 deprecated libin2 dosen't do nameprep & co, can't manage xmpp parts

  92. lovetox

    no validation at all?

  93. lovetox

    it does for node and resource part have precis modules

  94. Zash

    lovetox: "the", there are like 3 of them

  95. Zash

    3 versions

  96. lovetox

    the lastest

  97. Zash

    idna2003, idna2008, precis

  98. lovetox

    so wtf it definitly says a domainpart has to conform to idna2008

  99. lovetox

    so i certainly run it through idna2008 and be finished with it

  100. Zash

    Latest is https://tools.ietf.org/html/rfc7622

  101. lovetox

    yes thats what iam refering to

  102. jonas’

    I love how we still haven’t figured out how to do unicode release interop

  103. Zash

    This be the IDNA2008 + PRECIS thing righht?

  104. lovetox


  105. lovetox

    and idna2008 has no nameprep as precondition

  106. jonas’

    I dropped that in #debian-til and someone got nerdsniped by that and dug out that the unicode releases also don’t really have guidelines on compatibility

  107. lovetox

    so actually i can throw out all nameprep stuff

  108. Zash

    jonas’: No way we're doing that while in the middle of a pandemic, can't afford to waste painkillers and anti-fever meds on this horror

  109. jonas’


  110. jonas’

    luckily, the next unicode release will be postponed due to the pandemic, too

  111. Zash

    Praise Glob

  112. flow

    lovetox> so i certainly run it through idna2008 and be finished with it then you potentially disallow ipv4/ipv6 addresses in the domainpart, which are allowed

  113. lovetox

    after i check if its an ip :D

  114. flow

    lovetox, bonus points for allowing ipv6 scope IDs

  115. lovetox

    i do

  116. flow

    then here are your bonus points

  117. lovetox

    but only if you include it within []

  118. flow hands lovetox 13.37 bonus points

  119. flow

    lovetox, whatever the IP-literal rule of RFC 6874 allows