XSF Discussion - 2023-11-21


  1. Daniel

    can one update obsolete XEPs? :-)

  2. Daniel

    I need to update 0138 to spec out the interaction with sasl2

  3. Zash

    ^C^V into inbox, add recommendation to do a full flush at sensible times

  4. Zash

    Daniel, you mean this is still used? despite the security implications?

  5. Daniel

    Yes we use it. Bandwidth is more important for us than the security implications

  6. Zash

    Given that the attacks requires huge amounts of bandwidth, maybe we should un-obsolete it. Rate limits and some words around when it's appropriate to do a full flush might be good enough?

  7. Holger

    That's what flow kept advocating back then, IIRC.

  8. Daniel

    What does the attack do? You can discover words that are sent over the wire if you are both participant of the xmpp network and can observe traffic?

  9. Zash

    CRIME-esque IIRC, you could in theory discovery roster contacts or such.

  10. Zash

    https://blog.thijsalkema.de/blog/2014/08/07/https-attacks-and-xmpp-2-crime-and-breach/

  11. Daniel

    Yeah I don't want to make recommendations for the larger XMPP ecosystem but for us specifically this seems like an OK tradeoff

  12. Zash

    But then it also increases the amount of memory used on the server by a bunch IIRC.

  13. moparisthebest

    I want to spec out a new compression with zstd and a custom dictionary tuned for XMPP, I suspect it would be great, and remove the hit for flushing per stanza

  14. Link Mauve

    “17:40:53 larma> How many of us are planning to go to 37C3 in Hamburg? Do we want to register for an assembly?”, I am planning to go, if I can get a ticket, and would very much like to have an assembly if I’m going. :)

  15. edhelas

    > I want to spec out a new compression with zstd and a custom dictionary tuned for XMPP, I suspect it would be great, and remove the hit for flushing per stanza Interesting !

  16. debacle

    Daniel: > Yes we use it. Bandwidth is more important for us than the security implications We would like to use it for our IoT thingy. Our devices use client certs to log in, and data is measurement data, that changes every second. I guess, that zlib attacks or similar would not work well here?

  17. Daniel

    moparisthebest: I've looked into this before but I was never able to confirm that zstd (or at least the reference library) actually works with a fixed directory

  18. Ge0rG

    debacle: what about EXI for IoT?

  19. Ge0rG

    ir do you think there will be significant value in compressing the actual payloads?

  20. debacle

    Ge0rG I heard about EXI before, but is it a thing? I.e. is integration into libstrophe and ejabberd possible? Anyway, most of our traffic is probably content in XEP-0335: JSON Containers.

  21. Daniel

    > moparisthebest: I've looked into this before but I was never able to confirm that zstd (or at least the reference library) actually works with a fixed directory Nvm. I guess it doesn't have to be fixed if you still flush on every stanza

  22. Ge0rG

    debacle: JSON containers sound like you'll be able to compress those very much

  23. debacle

    Yes, esp. because our dict keys are strings, which all start and end with `"` :-)

  24. Ge0rG

    debacle: EXI is pretty hypothetical for XMPP. You need to construct and distribute a schema for it, and it will be fixed, and needs negotiation in XMPP land

  25. Kev

    And you need libraries in suitable languages and licenses, which seems to also be a sticking point.

  26. debacle

    We probably need JSON compression, not XMPP compresssion :-)

  27. Zash

    base64'd cbor? :)

  28. debacle

    Replace all `"` with 🥝️...

  29. Daniel

    Exi in general seems like a good idea. But the xep is overly complex and library support is missing as Kev pointed out

  30. Daniel

    Zstd with dictionary seems much more achievable

  31. Kev

    > base64'd cbor? :) I think you're joking, but ISTM there's a lot of mileage in a CBOR encoding for XMPP, it's on my TODO to look into it at some point.

  32. Zash

    Heh, mod_rest supports CBOR

  33. debacle

    Zash, yes, JSON ⇒ bzip2 ⇒ base64 ⇒ pubsub might be the best.

  34. Zash

    Maybe one of these decades I'll finish that presentation of mod_rest and the schema mapping stuff.

  35. debacle

    Or CBOR, but then we need yet another lib on our devices.

  36. Zash

    debacle, JSON schema un-mapping into XML!!! and then compress that .)

  37. debacle

    And CBOR2JSON on the subscribers side.

  38. debacle

    Zash I expect that XML can be more compact than JSON embedded in XML, just because of the many &-entities.

  39. Zash

    XML is more compact than JSON all by itself, depending on how you map your data into it.

  40. Guus

    debacle: Openfire has EXI support

  41. Kev

    I'm not claiming CBOR is a panacea, but there are things I want to be able to do where I *think* it would be very suitable.

  42. Zash

    Kev, how tho? Some generic XML mapping akin to the joke JSON mappings?

  43. debacle

    Guus, yes, but it wouldn't help much, because our JSON content is probably the largest part of the data. (And libstrophe and slixmpp and strophe.js would have to support it, too.)

  44. debacle

    Guus, yes, but it wouldn't help much, because our JSON content is probably the largest part of the data. (And libstrophe would have to support it, too.)

  45. Link Mauve

    debacle, EXI supports optionally compressing using traditional compression IIRC, from what Arc told us long ago.

  46. Kev

    > , how tho? Some generic XML mapping akin to the joke JSON mappings? General XML mapping like the joke JSON ones is already a win in some respects, but no, I want to do both fixed and dynamic dictionary type stuff as well.

  47. Zash

    E.g. CBOR equivalent of `["message",{"type":"chat"},["body",{},"Hello"]]` ?

  48. Kev

    That'd be the fallback position, yeah.

  49. Zash

    mod_rest would give you `{"kind":"message","type":"chat","body":"Hello"}` if asked nicely, or even its CBOR equivalent if it has a library for that.

  50. Zash

    But that's based on a *huge* schema that describes the conversion

  51. Kev

    I'm thinking of something like : `[0 (message), 0 (chat), "from@...", "to@...", {0(body): ["..."], "{non-dictionary}element": "..."}]`

  52. Zash

    Hm, how many {message,presence,iq}type were there?

  53. Kev

    That's another option.

  54. Zash

    Still, that requires some kind of schema

  55. Kev

    Yes.

  56. Kev

    I'm not trying to solve the general case with this.

  57. moparisthebest

    >> moparisthebest: I've looked into this before but I was never able to confirm that zstd (or at least the reference library) actually works with a fixed directory > Nvm. I guess it doesn't have to be fixed if you still flush on every stanza Daniel: right, you gotta flush on every stanza regardless, but the fixed shared dictionary "trained" on XMPP should cause it to compress well anyway, you know, in theory :)

  58. moparisthebest

    The hard part is kind of getting "real" XMPP traffic to train it on without a privacy violation

  59. MattJ

    I want the same for CSI testing

  60. pep.

    What's "real" traffic even :P

  61. moparisthebest

    That may be the harder part lol

  62. MattJ

    pep., that's why we can't just grab our own (XMPP developer) traffic and train it on ourselves :)

  63. jonas’

    well in theory scraping a server like yax.im or so would give decent data, but as moparisthebest pointed out… privacy nightmare.

  64. moparisthebest

    Wonder if I dump all stanzas from my family's server for a month and replaced all plain text with equal length text from public domain books or something

  65. Zash

    make a list of everything that you know for a fact isn't privacy-sensitive (like a set of xpaths), strip everything else or turn it into nonsense

  66. Zash

    I had a skeleton routine somewhere for some sort of stats

  67. pep.

    moparisthebest, you'd introduce a bias with these books, that isn't the same as your family chats

  68. Zash

    Threw away everything except <name xmlns=''> basically.

  69. moparisthebest

    Even in a perfect world that would skew the compression towards one extremely heavy muc user but that might be ok

  70. pep.

    In what language? :0°

  71. pep.

    In what language? :-°

  72. moparisthebest

    pep.: right but, generic texts, could experiment if it matters the language of the text

  73. moparisthebest

    Text already compresses well, I think the main thing we want it to "learn" is how to compress open/close tags

  74. Zash

    change every piece of text and attribute other than xmlns into random base64

  75. Zash

    Given OMEMO, that's probably what real traffic would look like anyway 🤷️

  76. jonas’

    hm, why not into random binary?

  77. jonas’

    well, ok, training the dictionary to remove redundancy introduced by base64 is probbly a good idea

  78. moparisthebest

    Ooh like the base64

  79. moparisthebest

    And we could *test* the dictionaries against real world stanzas

  80. Zash

    all the XEP examples is a thing

  81. pep.

    We're cgonna compress romeo and juliet perfectly

  82. pep.

    yay

  83. Zash

    YES

  84. moparisthebest

    Share the dictionaries and code around to select from mam and give % or whatever, we could all run it in our own servers

  85. Zash

    MAM and live traffic will be different

  86. Zash

    Waaaaaaaaaaaaaaaaaaay more presence

  87. pep.

    Yeah compressing presence would probably bring more results than messages :x

  88. pep.

    With mobile traffic..

  89. moparisthebest

    Does prosody have a module to log all stanzas :)

  90. moparisthebest

    mod_destroy_disk I assume

  91. pep.

    There's mod_raw_stanza, or stanza_debug

  92. pep.

    There's mod_rawdebug, or stanza_debug

  93. pep.

    There's mod_rawdebug, or stanzadebug

  94. Zash

    There's a bunch of variants depending on where exactly it picks out the stanza

  95. Zash

    Also mod_scansion_record which produces logs in the scansion format

  96. Alex

    Memberbot is still online until our member meeting starts this Thursday. When you are a XSF member and have not voted yet, then please take some time to cast your votes. Thanks

  97. root

    https://upload.nicolosus.chat:5281/file_share/cS-Vje4410X18I5BfAlacQEt/zb2rhjNBmj942p3DB1ejhaaUgZ1NZCgysD4qnxwEXQhmjXft9.jpg

  98. theTedd

    Regarding zstd with a fixed dictionary: from what I saw, it appears to be a fixed-size dictionary, but updated with a rolling context window (old symbols are replaced with new ones), so it doesn't help to avoid the crime/breach attacks. But even if you do manage to fix the dictionary, zstd also uses an entropy encoding for the other data, so that would still be vulnerable.

  99. Zash

    Couldn't you pre-bake a dictionary that would be static and never updated?

  100. root

    https://upload.nicolosus.chat:5281/file_share/oWqTagg7k0No8yQzWxC3APSF/giphy(1).webp

  101. theTedd

    With some hacking, probably, but the entropy encoding means you will still end up compression private symbols

  102. theTedd

    With some hacking, probably, but the entropy encoding means you will still end up compressing private symbols

  103. Zash

    I mean I'm pretty sure this was a feature of some compression lib

  104. Zash tries to squint at https://facebook.github.io/zstd/zstd_manual.html#Chapter13

  105. theTedd

    there is an option to use a pre-trained dictionary, and you can also limit the size of the dictionary, but you can't prevent the dictionary from being updated (with the standard options)

  106. Zash

    Meh

  107. Zash

    Anyone wanna make a FunStandardXMPP ? As in, actual fixed dictionary, put `<message ` etc in it and do simple substitution like WhatsApp, but like, with negotiation.

  108. Zash

    Could use UTF-8 encoding of > U+11000 for the lulz.

  109. moparisthebest

    > there is an option to use a pre-trained dictionary, and you can also limit the size of the dictionary, but you can't prevent the dictionary from being updated (with the standard options) worst case if this is true just throw away the state in between stanzas

  110. theTedd

    you could do that, but it's largely irrelevant because of the use of entropy encoding for the non-dictionary symbols

  111. moparisthebest

    compressing private symbols isn't a problem, sharing compression state across stanzas which may have different levels of visibility/target etc is

  112. theTedd

    ultimately, any compression method that treats the data as a stream and naively compresses it is vulnerable to those kinds of attacks

  113. moparisthebest

    yes, which is why we should not treat it as a stream, and instead, compress each stanza individually

  114. theTedd

    the content of the stanza is a stream of bytes

  115. moparisthebest

    no, we introduce framing, you get a u16 that tells you how long the next compressed stanza is, read that, decompress, process, repeat

  116. Zash

    Use zlib, make attacks infeasible with rate limits and periodic full flushes.

  117. singpolyma

    Don't the attacks require the attacker to be able to inject quite some content into your stanas?

  118. Zash

    yes, let's use framing, and also create an xml parser instance and a compression state per stanza. that'll be great for GC pressure!

  119. moparisthebest

    you can re-use your XML parser instance if you like, this is just a layer above it

  120. theTedd

    singpolyma, they only need to get you to repeat the one symbol they're trying to discover, and check whether that results in a smaller (more compressed) result

  121. moparisthebest

    attacks only get better, I'm not interested in "well we think it might be kind of hard now" meh

  122. theTedd

    framing doesn't change anything; you would need to avoid compressing certain symbols entirely, either by using an immutable dictionary, or by purposely avoiding compressing anything (mostly attribute values) that's not known to have a non-critical value

  123. moparisthebest

    why ?

  124. theTedd

    framing only says "this stanza is x bytes long" - the attack doesn't fake the length of the stanza

  125. moparisthebest

    with my proposal, stanza X always compresses to exactly the same bytes

  126. moparisthebest

    you can send all the stanzas you want to me, and you'll never influence the length of stanza X

  127. moparisthebest

    I think that fully solves the problem?

  128. theTedd

    ah, ok, I misunderstood

  129. singpolyma

    I don't think so

  130. singpolyma

    because the idea is to trick you into crafting a stanza X that contains some known stuff

  131. Daniel

    Yes the dictionary is not 'fixed' but you only provide the starting dictionary that's true. But I don't understand why you wouldn't be able to compress each stanza individually

  132. singpolyma

    and then measure how big it is

  133. Zash

    You can do that today with zlib if you feed the state with a prefix and discard it on the other end.

  134. theTedd

    Daniel, you can compress them individually and start from 0 every time - that will work; but zstd also uses entropy encoding for the rest of the data, which makes this irrelevant

  135. theTedd

    moparisthebest, what's the point of compression if you have to pad all data to full chunk sizes anyway?

  136. moparisthebest

    caveats to my plan include: 1. it might not be worth it, it might not save enough to be worth it, or might make the data bigger 2. it might be too resource intensive on the server to do this but, feel like those are things we'll just have to try and see

  137. moparisthebest

    theTedd, do you mean like packet sizes? that's some other layer's problem :P

  138. Daniel

    theTedd: are you saying that you can try to attack each stanza individually? But that would require that the injected bits and the bits you are trying to discover are in the same stanza, no?

  139. theTedd

    moparisthebest, if I understood your idea of framing, you want to pad the compressed stanza to fix its size to some upper bound

  140. singpolyma

    Daniel: yes, just like with the HTTP attacks

  141. theTedd

    Daniel, each stanza is attacked individually, as long as it carries some secret value to be discovered (if not, this whole discussing is moot anyway)

  142. moparisthebest

    theTedd, oh no, just the wire bytes would look like |u16: bytes to read for next stanza|that-many-bytes-of-data-to-read-then-uncompress-to-a-single-stanza|

  143. theTedd

    Daniel, each stanza is attacked individually, as long as it carries some secret value to be discovered (if not, this whole discussion is moot anyway)

  144. theTedd

    moparisthebest, prefixing the stanza with its length achieves what?

  145. Daniel

    The scenarios the og Blog post outlined had the secret and the injection in different stanzas

  146. moparisthebest

    you read one stanza at a time to reconstruct the stream

  147. Daniel

    Like a ping for example to inject and iq roster stuff to be leaked

  148. theTedd

    Daniel, if you have full-stream compression then that makes it easier, but per stanza still works, it just takes longer

  149. Zash

    Make it take longer, then use rate limits to make it take so long it doesn't matter. No framing required.

  150. moparisthebest

    I'm struggling to think where in XMPP a 3rd party can send me data that I then send elsewhere in a stanza mixed with private data :/

  151. moparisthebest

    certainly not impossible, but I'm coming up blank

  152. Zash

    forms?

  153. moparisthebest

    you send the form back to who sent it to you though, so no secrets

  154. theTedd

    the examples for the attacks are discovering a security token value; but it requires most of your responses to also include that value, plus my guesses of its value

  155. theTedd

    so it depends what you consider to be a critical value that needs to stay secret

  156. Zash

    As I understand, it takes a *lot* of tries to guess values and you only have until the value passes out of the dictionarybufferthing. Thus lots of bandwidth, thus rate limits should be fine as mitigation.

  157. Zash

    AFAIK nobody has ever attempted this attack on XMPP, so it seems mostly theoretical still?

  158. theTedd

    it takes a maximum of 256 tries per character, but 16 if you know the value is hex, and you can half that for a stupid average

  159. Zash

    The power of Unicode compells ye!

  160. theTedd

    the sensible thing to do is avoid compressing such values at all, either by keeping them out of the dictionary and/or marking them off to avoid any form of compression (by entropy encoding or otherwise)