XSF Discussion - 2017-02-20

  1. jonasw

    good morning

  2. Holger

    There seems to be a consensus that predictable resource strings introduce a privacy issue. Do we have any text in any XEP (or on the wiki or the mailing list) explaining the issue?

  3. MattJ


  4. Holger

    MattJ: Thanks. This says you "might be able to determine if the client [...] is online" if you know the resource string. There's no explanation how you would determine that, right?

  5. MattJ

    Not anything explicit that I know of

  6. Holger

    MattJ: I'm aware of a few ways to do that at least with ejabberd users, but each of those are spec or implementation issues which IMO should be fixed either way (rather than only hidden by making the resource string unpredictable).

  7. Ge0rG

    Holger: +1

  8. MattJ

    But this is implicitly assumed by various XEPs

  9. Ge0rG

    Holger: the only issue I'm aware of is periodic sending of IQs to a full JID, to probe the client availability and network latency

  10. Holger

    MattJ: Unpredictable resource strings are assumed?

  11. Ge0rG

    It's also a really poor idea to use UUIDs for randomness if we have the full power of unicode (or at least base32 / base64)

  12. Holger


  13. MattJ

    I'd wager that yes, numerous XEPs rely on unpredictable resource strings for security/privacy purposes

  14. Ge0rG

    YouTube, the world's most used video platform, manage to identify videos by less than a dozen characters. Why do we need to use 36 characters to identify two clients on an account?

  15. Kev

    If there's a standard for how to do something equivalent to UUID (global uniqueness without fingerprinting) in fewer bytes, using that instead of UUID seems entirely sensible.

  16. Ge0rG

    Are we protecting from enumeration attacks? From birthday attacks? What is the actual amount of randomness required in a resource to prevent those?

  17. Holger

    Kev: Why do we need a standard for that? I mean there's no interop requirement?

  18. jonasw


  19. jonasw


  20. Kev

    Holger: Well, we'd at least want it to be consistently implemented across clients, else you can fingerprint (which probably isn't the end of the world, but seems like something we'd want to prevent).

  21. Ge0rG

    Kev: sticking to default implementations is what causes user-visible URLs like https://xmpp.yaxim.org:5281/upload/54f59abf-de9b-4fb9-a1e4-f0b5b78a0a9d/a5f532df-0fef-47c4-81a9-2aff690419fc.jpg

  22. intosi

    Holger: libraries have the benefit of people not writing their own random identifier generator.

  23. intosi

    * standard methods present in libraries, I mean.

  24. jonasw

    getentropy(nbits=32) | base64 | strip('=') should be possible with every standard library.

  25. jonasw

    we are requiring PRECIS which has *much less* support than that

  26. intosi

    Which base64?

  27. intosi

    But yeah, another well-defined method could work equally well.

  28. jonasw

    intosi: okay, point taken. getentropy(nbits=32) | base64 | tr('_/', '_-') | strip('=')

  29. intosi

    When unicode comes into play, I'm less inclined to believe that every client gets it right, though. Unicode is hard.

  30. jonasw

    base64 is luckily only ascii

  31. Ge0rG

    what we are looking for is a number of random bits packed into as few characters as possible for a given element

  32. Kev

    Well, not quite.

  33. jonasw


  34. Zash

    Base 85

  35. Kev

    We're also looking for it to be typable by server admins grepping for logs, etc.

  36. Kev

    And visibly distinguishable.

  37. Kev

    Simply picking random bits forced into UTF-8 clearly wouldn't work, for example.

  38. Ge0rG

    Kev: uuids are not very visibly distinguishable.

  39. Kev

    If you give me two UUIDs, I'm fairly sure I can tell you if they're the same.

  40. jonasw

    Kev: agreed (with random bits in utf-8), not only because of distinguiushability, but because it also needs to pass resourceprep and/or precis unmodified. that’s nontrivial.

  41. Ge0rG

    Kev: what if you have a complex scenario with N clients in a MUC, a set of M reflected messages with rewritten UUID ids? For which values of N and M can you keep N+2*M UUIDs in your short-term memory?

  42. Zash

    Disco identity ?

  43. jonasw

    what about dictionary-based strings?

  44. jonasw

    just sample from /usr/share/dict/british-english-insane (it’s a thing!)

  45. Holger

    Zash: There's no disco identity when staring at XEP examples ...

  46. Ge0rG

    base64-strings have better visible distinguishability due to more uppercase/lowercase.

  47. Kev

    Ge0rG: In what scenario are you imagining someone debugging such a thing and using the resources as the mental identifiers?

  48. Kev

    As I said earlier, if we can more tightly pack than UUID, while still maintaining human readability, great.

  49. Kev

    (And the other desirable properties of UUIDs)

  50. intosi

    Basically still gibberish. I'm not convinced it would make a difference, I would suck at remembering many of either UUID, base64, or picks-from-a-wordlist-with-sufficient-entropy.

  51. intosi

    For all of them, I would just use the first four or five chars anyway.

  52. Ge0rG

    Kev: which identifiers would you use if all you have at hand are debug logs?

  53. Holger

    Kev: I'm still quite baffled you'd list 'readability' as one of the properties of UUIDs :-)

  54. Kev

    Nicks in a MUC, in the usual case of them being pretty static.

  55. Holger

    (Then again these XMPP people are used to reading XML ...)

  56. Kev

    Holger: They're not *nice*, but it is possible to straightforwardly read them.

  57. Holger

    Kev: But what makes them any more readable than e.g. Base64?

  58. Ge0rG

    Kev: but I want it *nice*.

  59. Kev

    Compared with random UTF-8, which is impossible to read because you've no idea which of the many matching characters you're looking at.

  60. Kev

    Holger: Nothing.

  61. Holger

    Oh. Well yeah there's always something worse.

  62. jonasw

    Kev: nobody was seriously talking about using random utf-8

  63. jonasw

    (I hope.)

  64. Kev

    jonasw: You might not be taking Ge0rG seriously, but I was still trying to address his point that we should be using UTF.

  65. jonasw

    Kev: I’m pretty sure Ge0rG was only making a point that using only hex is a waste of bytes and we should use base64 or something.

  66. Ge0rG


  67. Kev

    jonasw: Given he explicitly said we should use the power of UTF-8, I don't think so :)

  68. jonasw

    "the power of unicode", right…

  69. Kev

    You're right, he said unicode, not UTF-8.

  70. Ge0rG

    Kev: actually, I should have written "a number of random bits packed into as few bytes as possible" - that should rule out non-ascii

  71. Kev


  72. Kev

    You get denser packing with UTF-8 than with ASCII encoded as UTF-8.

  73. intosi

    Well, that emoji-encoder uses four bytes per byte ;)

  74. jonasw

    Kev: I’m not sure about that. with ascii, you have a constant 7 bits / byte, with UTF-8 you lose bits for codepoints above 127 for signalling

  75. Ge0rG

    intuitively, I'm with jonasw here.

  76. Kev

    I might be wrong, conceivably. It doesn't match my mental mapping of UTF-8, but I know that's a poor mapping.

  77. jonasw

    in any case, that’s not the point of the discussion.

  78. Kev

    I think you get to use the extra bit as encoding for at least the first byte.

  79. Kev


  80. Ge0rG

    nobody has still said what attack we are trying to prevent, and how many bits of randomness are required to guard it off

  81. jonasw

    (using any fancy unicode would be very hard, as I said, with the mappings done by resourceprep et al anyways)

  82. Kev

    The important thing isn't that these are UUIDs, it's that they have the properties of UUIDs and are consistently implemented.

  83. Ge0rG

    in a world where clients leak their presence like crazy, having 256 bits of randomness in the resource might be a solution to the wrong problem.

  84. Kev goes back to work.

  85. Zash

    XEP-0198 doesn't define what should happen when a session times out, or am I missing something?

  86. Zash

    Ge0rG: Btw, for maximum entropy per byte with base64, get a multiple of 3 bytes. You get a multiple of 4 bytes out and no padding.

  87. Ge0rG

    Zash: or you just strip away the padding

  88. Zash

    Ge0rG: There may still encoded 0 bits

  89. Zash

    Ge0rG: There may still be encoded 0 bits

  90. Ge0rG

    Zash: but how does that relate to my original statement?

  91. Zash

    Ge0rG: Being pedantic.

  92. Zash

    1 byte input → XX== 2 byte input → XXX= 3 byte input → XXXX

  93. Ge0rG

    Zash: I never proposed base64 as _the_ solution

  94. jonasw

    (I do)

  95. Zash

    Can someone enlighten me about what the sane thing to do about this would be: https://prosody.im/issues/issue/836

  96. Zash

    Specifically, what to do with un-acked stanzas when the session times out and what order things should happen in.

  97. Ge0rG

    Zash: the messages get error-reflected to the MUC, causing a kick. I think this is valid

  98. Ge0rG

    At least I don't see anything mandate that the user should leave normally

  99. Zash

    Ge0rG: I don't see any text saying anything in either direction

  100. Ge0rG

    Zash: ask the submitter?