-
Daniel
can one update obsolete XEPs? :-)
-
Daniel
I need to update 0138 to spec out the interaction with sasl2
-
Zash
^C^V into inbox, add recommendation to do a full flush at sensible times
-
Zash
Daniel, you mean this is still used? despite the security implications?
-
Daniel
Yes we use it. Bandwidth is more important for us than the security implications
-
Zash
Given that the attacks requires huge amounts of bandwidth, maybe we should un-obsolete it. Rate limits and some words around when it's appropriate to do a full flush might be good enough?
-
Holger
That's what flow kept advocating back then, IIRC.
-
Daniel
What does the attack do? You can discover words that are sent over the wire if you are both participant of the xmpp network and can observe traffic?
-
Zash
CRIME-esque IIRC, you could in theory discovery roster contacts or such.
-
Zash
https://blog.thijsalkema.de/blog/2014/08/07/https-attacks-and-xmpp-2-crime-and-breach/
-
Daniel
Yeah I don't want to make recommendations for the larger XMPP ecosystem but for us specifically this seems like an OK tradeoff
-
Zash
But then it also increases the amount of memory used on the server by a bunch IIRC.
-
moparisthebest
I want to spec out a new compression with zstd and a custom dictionary tuned for XMPP, I suspect it would be great, and remove the hit for flushing per stanza
-
Link Mauve
“17:40:53 larma> How many of us are planning to go to 37C3 in Hamburg? Do we want to register for an assembly?”, I am planning to go, if I can get a ticket, and would very much like to have an assembly if I’m going. :)
-
edhelas
> I want to spec out a new compression with zstd and a custom dictionary tuned for XMPP, I suspect it would be great, and remove the hit for flushing per stanza Interesting ! ↺
-
debacle
Daniel: > Yes we use it. Bandwidth is more important for us than the security implications We would like to use it for our IoT thingy. Our devices use client certs to log in, and data is measurement data, that changes every second. I guess, that zlib attacks or similar would not work well here?
-
Daniel
moparisthebest: I've looked into this before but I was never able to confirm that zstd (or at least the reference library) actually works with a fixed directory
-
Ge0rG
debacle: what about EXI for IoT?
-
Ge0rG
ir do you think there will be significant value in compressing the actual payloads?
-
debacle
Ge0rG I heard about EXI before, but is it a thing? I.e. is integration into libstrophe and ejabberd possible? Anyway, most of our traffic is probably content in XEP-0335: JSON Containers.
-
Daniel
> moparisthebest: I've looked into this before but I was never able to confirm that zstd (or at least the reference library) actually works with a fixed directory Nvm. I guess it doesn't have to be fixed if you still flush on every stanza
-
Ge0rG
debacle: JSON containers sound like you'll be able to compress those very much
-
debacle
Yes, esp. because our dict keys are strings, which all start and end with `"` :-)
-
Ge0rG
debacle: EXI is pretty hypothetical for XMPP. You need to construct and distribute a schema for it, and it will be fixed, and needs negotiation in XMPP land
-
Kev
And you need libraries in suitable languages and licenses, which seems to also be a sticking point.
-
debacle
We probably need JSON compression, not XMPP compresssion :-)
-
Zash
base64'd cbor? :)
-
debacle
Replace all `"` with 🥝️...
-
Daniel
Exi in general seems like a good idea. But the xep is overly complex and library support is missing as Kev pointed out
-
Daniel
Zstd with dictionary seems much more achievable
-
Kev
> base64'd cbor? :) I think you're joking, but ISTM there's a lot of mileage in a CBOR encoding for XMPP, it's on my TODO to look into it at some point.
-
Zash
Heh, mod_rest supports CBOR
-
debacle
Zash, yes, JSON ⇒ bzip2 ⇒ base64 ⇒ pubsub might be the best.
-
Zash
Maybe one of these decades I'll finish that presentation of mod_rest and the schema mapping stuff.
-
debacle
Or CBOR, but then we need yet another lib on our devices.
-
Zash
debacle, JSON schema un-mapping into XML!!! and then compress that .)
-
debacle
And CBOR2JSON on the subscribers side.
-
debacle
Zash I expect that XML can be more compact than JSON embedded in XML, just because of the many &-entities.
-
Zash
XML is more compact than JSON all by itself, depending on how you map your data into it.
-
Guus
debacle: Openfire has EXI support
-
Kev
I'm not claiming CBOR is a panacea, but there are things I want to be able to do where I *think* it would be very suitable.
-
Zash
Kev, how tho? Some generic XML mapping akin to the joke JSON mappings?
-
debacle
Guus, yes, but it wouldn't help much, because our JSON content is probably the largest part of the data. (And libstrophe and slixmpp and strophe.js would have to support it, too.)✎ -
debacle
Guus, yes, but it wouldn't help much, because our JSON content is probably the largest part of the data. (And libstrophe would have to support it, too.) ✏
-
Link Mauve
debacle, EXI supports optionally compressing using traditional compression IIRC, from what Arc told us long ago.
-
Kev
> , how tho? Some generic XML mapping akin to the joke JSON mappings? General XML mapping like the joke JSON ones is already a win in some respects, but no, I want to do both fixed and dynamic dictionary type stuff as well.
-
Zash
E.g. CBOR equivalent of `["message",{"type":"chat"},["body",{},"Hello"]]` ?
-
Kev
That'd be the fallback position, yeah.
-
Zash
mod_rest would give you `{"kind":"message","type":"chat","body":"Hello"}` if asked nicely, or even its CBOR equivalent if it has a library for that.
-
Zash
But that's based on a *huge* schema that describes the conversion
-
Kev
I'm thinking of something like : `[0 (message), 0 (chat), "from@...", "to@...", {0(body): ["..."], "{non-dictionary}element": "..."}]`
-
Zash
Hm, how many {message,presence,iq}type were there?
-
Kev
That's another option.
-
Zash
Still, that requires some kind of schema
-
Kev
Yes.
-
Kev
I'm not trying to solve the general case with this.
-
moparisthebest
>> moparisthebest: I've looked into this before but I was never able to confirm that zstd (or at least the reference library) actually works with a fixed directory > Nvm. I guess it doesn't have to be fixed if you still flush on every stanza Daniel: right, you gotta flush on every stanza regardless, but the fixed shared dictionary "trained" on XMPP should cause it to compress well anyway, you know, in theory :) ↺
-
moparisthebest
The hard part is kind of getting "real" XMPP traffic to train it on without a privacy violation
-
MattJ
I want the same for CSI testing
-
pep.
What's "real" traffic even :P
-
moparisthebest
That may be the harder part lol
-
MattJ
pep., that's why we can't just grab our own (XMPP developer) traffic and train it on ourselves :)
-
jonas’
well in theory scraping a server like yax.im or so would give decent data, but as moparisthebest pointed out… privacy nightmare.
-
moparisthebest
Wonder if I dump all stanzas from my family's server for a month and replaced all plain text with equal length text from public domain books or something
-
Zash
make a list of everything that you know for a fact isn't privacy-sensitive (like a set of xpaths), strip everything else or turn it into nonsense
-
Zash
I had a skeleton routine somewhere for some sort of stats
-
pep.
moparisthebest, you'd introduce a bias with these books, that isn't the same as your family chats
-
Zash
Threw away everything except <name xmlns=''> basically.
-
moparisthebest
Even in a perfect world that would skew the compression towards one extremely heavy muc user but that might be ok
-
pep.
In what language? :0°✎ -
pep.
In what language? :-° ✏
-
moparisthebest
pep.: right but, generic texts, could experiment if it matters the language of the text
-
moparisthebest
Text already compresses well, I think the main thing we want it to "learn" is how to compress open/close tags
-
Zash
change every piece of text and attribute other than xmlns into random base64
-
Zash
Given OMEMO, that's probably what real traffic would look like anyway 🤷️
-
jonas’
hm, why not into random binary?
-
jonas’
well, ok, training the dictionary to remove redundancy introduced by base64 is probbly a good idea
-
moparisthebest
Ooh like the base64
-
moparisthebest
And we could *test* the dictionaries against real world stanzas
-
Zash
all the XEP examples is a thing
-
pep.
We're cgonna compress romeo and juliet perfectly
-
pep.
yay
-
Zash
YES
-
moparisthebest
Share the dictionaries and code around to select from mam and give % or whatever, we could all run it in our own servers
-
Zash
MAM and live traffic will be different
-
Zash
Waaaaaaaaaaaaaaaaaaay more presence
-
pep.
Yeah compressing presence would probably bring more results than messages :x
-
pep.
With mobile traffic..
-
moparisthebest
Does prosody have a module to log all stanzas :)
-
moparisthebest
mod_destroy_disk I assume
-
pep.
There's mod_raw_stanza, or stanza_debug✎ - pep.
-
pep.
There's mod_rawdebug, or stanzadebug ✏
-
Zash
There's a bunch of variants depending on where exactly it picks out the stanza
-
Zash
Also mod_scansion_record which produces logs in the scansion format
-
Alex
Memberbot is still online until our member meeting starts this Thursday. When you are a XSF member and have not voted yet, then please take some time to cast your votes. Thanks
-
root
https://upload.nicolosus.chat:5281/file_share/cS-Vje4410X18I5BfAlacQEt/zb2rhjNBmj942p3DB1ejhaaUgZ1NZCgysD4qnxwEXQhmjXft9.jpg
-
theTedd
Regarding zstd with a fixed dictionary: from what I saw, it appears to be a fixed-size dictionary, but updated with a rolling context window (old symbols are replaced with new ones), so it doesn't help to avoid the crime/breach attacks. But even if you do manage to fix the dictionary, zstd also uses an entropy encoding for the other data, so that would still be vulnerable.
-
Zash
Couldn't you pre-bake a dictionary that would be static and never updated?
-
root
https://upload.nicolosus.chat:5281/file_share/oWqTagg7k0No8yQzWxC3APSF/giphy(1).webp
-
theTedd
With some hacking, probably, but the entropy encoding means you will still end up compression private symbols✎ -
theTedd
With some hacking, probably, but the entropy encoding means you will still end up compressing private symbols ✏
-
Zash
I mean I'm pretty sure this was a feature of some compression lib
- Zash tries to squint at https://facebook.github.io/zstd/zstd_manual.html#Chapter13
-
theTedd
there is an option to use a pre-trained dictionary, and you can also limit the size of the dictionary, but you can't prevent the dictionary from being updated (with the standard options)
-
Zash
Meh
-
Zash
Anyone wanna make a FunStandardXMPP ? As in, actual fixed dictionary, put `<message ` etc in it and do simple substitution like WhatsApp, but like, with negotiation.
-
Zash
Could use UTF-8 encoding of > U+11000 for the lulz.
-
moparisthebest
> there is an option to use a pre-trained dictionary, and you can also limit the size of the dictionary, but you can't prevent the dictionary from being updated (with the standard options) worst case if this is true just throw away the state in between stanzas ↺
-
theTedd
you could do that, but it's largely irrelevant because of the use of entropy encoding for the non-dictionary symbols
-
moparisthebest
compressing private symbols isn't a problem, sharing compression state across stanzas which may have different levels of visibility/target etc is
-
theTedd
ultimately, any compression method that treats the data as a stream and naively compresses it is vulnerable to those kinds of attacks
-
moparisthebest
yes, which is why we should not treat it as a stream, and instead, compress each stanza individually
-
theTedd
the content of the stanza is a stream of bytes
-
moparisthebest
no, we introduce framing, you get a u16 that tells you how long the next compressed stanza is, read that, decompress, process, repeat
-
Zash
Use zlib, make attacks infeasible with rate limits and periodic full flushes.
-
singpolyma
Don't the attacks require the attacker to be able to inject quite some content into your stanas?
-
Zash
yes, let's use framing, and also create an xml parser instance and a compression state per stanza. that'll be great for GC pressure!
-
moparisthebest
you can re-use your XML parser instance if you like, this is just a layer above it
-
theTedd
singpolyma, they only need to get you to repeat the one symbol they're trying to discover, and check whether that results in a smaller (more compressed) result
-
moparisthebest
attacks only get better, I'm not interested in "well we think it might be kind of hard now" meh
-
theTedd
framing doesn't change anything; you would need to avoid compressing certain symbols entirely, either by using an immutable dictionary, or by purposely avoiding compressing anything (mostly attribute values) that's not known to have a non-critical value
-
moparisthebest
why ?
-
theTedd
framing only says "this stanza is x bytes long" - the attack doesn't fake the length of the stanza
-
moparisthebest
with my proposal, stanza X always compresses to exactly the same bytes
-
moparisthebest
you can send all the stanzas you want to me, and you'll never influence the length of stanza X
-
moparisthebest
I think that fully solves the problem?
-
theTedd
ah, ok, I misunderstood
-
singpolyma
I don't think so
-
singpolyma
because the idea is to trick you into crafting a stanza X that contains some known stuff
-
Daniel
Yes the dictionary is not 'fixed' but you only provide the starting dictionary that's true. But I don't understand why you wouldn't be able to compress each stanza individually
-
singpolyma
and then measure how big it is
-
Zash
You can do that today with zlib if you feed the state with a prefix and discard it on the other end.
-
theTedd
Daniel, you can compress them individually and start from 0 every time - that will work; but zstd also uses entropy encoding for the rest of the data, which makes this irrelevant
-
theTedd
moparisthebest, what's the point of compression if you have to pad all data to full chunk sizes anyway?
-
moparisthebest
caveats to my plan include: 1. it might not be worth it, it might not save enough to be worth it, or might make the data bigger 2. it might be too resource intensive on the server to do this but, feel like those are things we'll just have to try and see
-
moparisthebest
theTedd, do you mean like packet sizes? that's some other layer's problem :P
-
Daniel
theTedd: are you saying that you can try to attack each stanza individually? But that would require that the injected bits and the bits you are trying to discover are in the same stanza, no?
-
theTedd
moparisthebest, if I understood your idea of framing, you want to pad the compressed stanza to fix its size to some upper bound
-
singpolyma
Daniel: yes, just like with the HTTP attacks
-
theTedd
Daniel, each stanza is attacked individually, as long as it carries some secret value to be discovered (if not, this whole discussing is moot anyway)✎ -
moparisthebest
theTedd, oh no, just the wire bytes would look like |u16: bytes to read for next stanza|that-many-bytes-of-data-to-read-then-uncompress-to-a-single-stanza|
-
theTedd
Daniel, each stanza is attacked individually, as long as it carries some secret value to be discovered (if not, this whole discussion is moot anyway) ✏
-
theTedd
moparisthebest, prefixing the stanza with its length achieves what?
-
Daniel
The scenarios the og Blog post outlined had the secret and the injection in different stanzas
-
moparisthebest
you read one stanza at a time to reconstruct the stream
-
Daniel
Like a ping for example to inject and iq roster stuff to be leaked
-
theTedd
Daniel, if you have full-stream compression then that makes it easier, but per stanza still works, it just takes longer
-
Zash
Make it take longer, then use rate limits to make it take so long it doesn't matter. No framing required.
-
moparisthebest
I'm struggling to think where in XMPP a 3rd party can send me data that I then send elsewhere in a stanza mixed with private data :/
-
moparisthebest
certainly not impossible, but I'm coming up blank
-
Zash
forms?
-
moparisthebest
you send the form back to who sent it to you though, so no secrets
-
theTedd
the examples for the attacks are discovering a security token value; but it requires most of your responses to also include that value, plus my guesses of its value
-
theTedd
so it depends what you consider to be a critical value that needs to stay secret
-
Zash
As I understand, it takes a *lot* of tries to guess values and you only have until the value passes out of the dictionarybufferthing. Thus lots of bandwidth, thus rate limits should be fine as mitigation.
-
Zash
AFAIK nobody has ever attempted this attack on XMPP, so it seems mostly theoretical still?
-
theTedd
it takes a maximum of 256 tries per character, but 16 if you know the value is hex, and you can half that for a stupid average
-
Zash
The power of Unicode compells ye!
-
theTedd
the sensible thing to do is avoid compressing such values at all, either by keeping them out of the dictionary and/or marking them off to avoid any form of compression (by entropy encoding or otherwise)