Yeah, just get the Board member to mail me at my isode account please.
Kev
I thought all of Board knew how to contact me by now, sorry.
stpeterhas joined
stpeterhas left
alameyohas left
emus
Thanks, yeah I just checked your page
Alexhas joined
alameyohas joined
Andrzejhas joined
stpeterhas joined
stpeterhas left
alameyohas left
arc
goffi: when you're around, I'd love to talk about xep-0355
goffi
arc: hi, I'm here
arc
When you drafted this, did you consider using xpath instead of namespace plus attributes?
govanifyhas left
govanifyhas joined
arc
One thing that you have written here that I had not considered is client defined services. It is an interesting idea, though I am curious how you envision this being implemented in the real world
goffi
initially there was only namespace, the attribute has been added because it was the only to distinguish MAM for chat from MAM for Pubsub. XPath would probably complicate (and servers would need to have a handy xpath implementation available).
goffi
Note that I would love to have xpath or at least simplified xpath in XMPP
arc
XPath is certainly more complicated. But is also more versatile. One of the problems with XPath is modern implementations are rare and hard to come by.
Kev
As a standards body we’re generally wary of xpath because of the burden it places on implementations.
MattJ
FWIW XPath is also being discussed as a solution for improving push notifications currently
MattJ
Maybe 2021 is the year to embrace XPath :)
Zash
Is "simplified xpath" a thing?
MattJ
XMPPath
goffi
I've written this years ago, so I need to refresh a bit my memory ^^. But the idea was basically that you if you want e.g. advanced PEP and your server doesn't offer that, you could have a third party implementation and ask yourself your server to redirect stanza there, without having to wait for admin or anything.
Kev
MattJ: Ah, interesting. Where’s the push notification discussion happening? I missed that if it was on list.
nycohas left
goffi
Zash: I don't know if it's already a thing, but just keeping the base of path and attribute matching, without all the methods could be useful and easy to implement.
Kev
I’m coming to the conclusion that we need two distinct models for push notifications - one for e2e and one for non-e2e.
Kev
And trying to address both with the same mechanism is likely to be painful.
Zash
goffi, so just a path like `/{xmlns}name/...`?
goffi
Zash: for instance yes, maybe with attribute matching. I don't know if this would be enough for our needs though.
arc
Nobody wants to work on lower level XML software. It's not sexy. It is easily overlooked as a resume filler. Nobody will pay for it. So that boils down to students or retired developers who love XML and take it on as a challenge.
Kev
Teensy bit of hyperbole there :)
Zash
No true Scotsman wants to work on XML! /s
Kev
Zash: :D
nycohas joined
arc
Kev: is it? I mean certainly there are /some/ people working on it.
Andrzej
Kev, there was noting on the list about push notifications and xpath, but it was just an idea to allow client select what it wants to have in the push notifications payload (assuming it is encrypted) https://github.com/tigase/tigase-xeps/issues/4
Andrzej
or just filter what is expected to send push notification and what is not
andrey.ghas joined
Zash
Including a stanza "skeleton" was suggested at some point as well, ie stripping all content and attributes, leaving only `<name xmlns="">` of each tag.
Zash
Uh, with my server dev hat on, plz no XSLT in the server
Andrzej
Zash, that was just an idea, something else allowing to get some data out of stanza and trigger push notifications would be good as well
alameyohas joined
stpeterhas joined
stpeterhas left
arc
I am certainly on board for discussing a limited XDM for use with a XMPP stream
Ge0rG
I think the idea of stanza skeletons was that the skeleton gives the push server all required info for deciding whether to make a silent or a noisy push notification, without leaking any actual user data
Ge0rG
So this is a different use case from "send encrypted message payload over push"
arc
Not sure I follow
Zash
Right, giving the client everything it needs vs giving the push gateway everything it needs.
Andrzej
after online FOSDEM and Matrix usage I'm pretty sure that usage of Push & Fetch is a bad thing compared to pushing encrypted data (almost 30% of battery used by Element fetching data)
Andrzej
I think that giving control to the client would allow them to evolve quicker with less changes on the server side
alameyohas left
florettahas joined
Ge0rG
Andrzej: I'm not sure if FCM and APNS will be okay with carrying all your encrypted payloads.
stphas joined
Andrzej
Why not? there is a limit of 4KB but it would work
Ge0rG
That said, what you fundamentally want in that case is to initiate/terminate the xmpp client session on your app server and have a custom protocol that pushes over APNS to your client
Ge0rG
because then you can optimize everything
Ge0rG
As long as you have the xmpp client connection terminated on the mobile device, it will have to reconnect to the server and to resume the 0198 session rather often
Andrzej
every 30-120s on iOS and most likely will be killed anyway
Ge0rG
Andrzej: only if you receive something that warrants an ack
Andrzej
that is why I prefer to have offline client and pushes with notifications on which user can act (ie. open app)
govanifyhas left
Ge0rG
"Your chat program received an XML stanza. Open app to see if it was something you care about"
govanifyhas joined
Andrzej
as for APNS and encrypted push, they've created even example for doing that https://developer.apple.com/documentation/usernotifications/modifying_content_in_newly_delivered_notifications see Listing 1
Andrzej
Ge0rG, I was usable "content" in the notification and with encryption and XPath that could work quite well✎
Ge0rG
Andrzej: well, I suppose you could use that to send an encrypted blob to the app
Andrzej
Ge0rG, I want usable "content" in the notification and with encryption and XPath that could work quite well ✏
pasdesushihas joined
Andrzej
I'm already doing that in Tigase & Siskin and AFAIR in Prosody
Ge0rG
Andrzej: so what you want from XEP-0357 is to pass the full stanza to the app server?
Andrzej
no
Zash
Pass the parts you're interested in, encrypted, via the app server to the client?
Andrzej
to allow client send "XSLT" to tranform stanza in "encrypted payload" opaque to the app servers
Ge0rG
Andrzej: but the app server is under your control, so you can do arbitrary modifications ther✎
Andrzej
this transformation and encryption is done on the XMPP server side
Ge0rG
Andrzej: but the app server is under your control, so you can do arbitrary modifications there ✏
Andrzej
yes, but XMPP server is leaking user data to the app server in your example
Ge0rG
Andrzej: that's a very significant effort for the xmpp server. What key should it use?
Ge0rG
also that means the xmpp server needs to have significant knowledge over the used client / app-server infrastructure
Andrzej
client with transformation would upload key, in my case AES128 key
Ge0rG
Why not an ec25519 key?
Ge0rG
You don't really need the server to be able to decrypt the payload, right ;)
Andrzej
right, but it is aware of it anyway, so it can decrypt it
Ge0rG
I am not sure if the trade-off of leaking the private data to the app server is a big thing.
Andrzej
I'm not sure which algorithm is faster, I've assumed AES128 is good in this case
Ge0rG
Well, encryption isn't the bottleneck; XSLT is
pasdesushihas left
Zash
XSLT seems like overkill.
Andrzej
As I've said, it does not have to be XSLT, but could be list of fields + XPath to fetch then✎
Andrzej
As I've said, it does not have to be XSLT, but could be list of fields + XPath to fetch them ✏
Kev
> Uh, with my server dev hat on, plz no XSLT in the server
M-Link has xslt to multiple reasons :|
> just an idea to allow client select what it wants to have in the push notifications
That seems sane at first thought.
arc
Andrzej: I tend to use XPath as a generic term for XDM myself so this is no way a criticism, I'm just wondering if you are talking about XPath proper
pasdesushihas joined
Andrzej
I'm not sure, I was thinking (as XSLT cannot be used) to create some XEP that would specify filteiring data and fetching them from existing XMPP schema in to some structure, basically transforming stanza into something else
Andrzej
to filter and fetch data I was thinking about using XPath, but I could be wrong
pasdesushihas left
pasdesushihas joined
Ge0rG
Andrzej: what about a whitelist of xml namespaces to retain in the <message> element? would that suffice?
Ge0rG
something like https://hg.prosody.im/prosody-modules/file/64b7daa6c42c/mod_csi_battery_saver/mod_csi_battery_saver.lua#l105 but more formalized
Andrzej
as we have 4KB hard limit, I think that might be not enough to just filter elements
Andrzej
but if clients could decide what should be included that would be a step forward
Zash
Namespaces does seem like a plausible Good Enough, and is easy to implement.
jonas’
4kb after base64, right?
Andrzej
yes, correct, after base64
jonas’
so in effect just about 1kB
Andrzej
not 3KB?
Zash
`base64(encrypt(strip(<message/>)))`?
Zash
Depending on how much we trust the app server to not be /too/ evil, we could stuff some compression in there too
Andrzej
Zash, I think this could be too big and in some cases it is more valuable to lose some of the message <body/> but deliver notification correctly
pasdesushihas left
Zash
How big are your messages?
arc
Xmpp does not implement full XML. Never did. It is a subset. So I don't see a problem with creating a subset of XDM
Ge0rG
So you end up encoding the limits of google and apple services into my server.
arc
XPath and xquery both use XDM
jonas’
right, 3kB, not just 1
jonas’
Zash: with omemo, messages can get big quickly
Zash
I was just thinking that, with omemo, the server can't do anything like ship half the <body>
Andrzej
is server is aware of clients OMEMO device id it could filter out keys in the notification and notification could still work✎
jonas’
XDM seems to be xpath 2.0, which afair lacks implentations and is may be overkill
Andrzej
if server is aware of clients OMEMO device id it could filter out keys in the notification and notification could still work ✏
florettahas left
Wojtekhas joined
arc
Referring back to my earlier statement about how nobody wants to work on XML stuff 😆
Zash
You can have all of XPath as long as it's only `/{xmlns}name/{otherns}foo@bar/` and nothing else.
mathieui
Zash, and attribute matching!
Zash
Nope!
arc
At this point we're really talking about XPath 3.0 or quite possibly XPath 4.0
Zash
Prosody doesn't have that, so it doesn't exist!
arc
Given that everything in this community takes 10 years or more to come about
jonas’
arc: who is "we"?
arc
We in this room right now, I would hope
arc
And this community as in XML community. Given that we are still stuck with XPath/xslt/etc in browsers now in 2021
Zash
I'd refer back to what Kev said
arc
XSLT is perfectly fine stop hating on it 😋
stpeterhas joined
stpeterhas left
Andrzej
arc, would exi work for transforming XML stanza into "payload" for the client?
Andrzej
maybe I should be thinking about EXI instead of XSLT
Steve Killehas left
Kevhas left
Zash
I am, of course, referring to https://logs.xmpp.org/xsf/2021-02-11?p=h#2021-02-11-521589a72d0ff42a
Kevhas joined
Kevhas left
Kevhas joined
matkorhas left
matkorhas joined
arc
No. EXI is just an alternative representation of XML data. It does make XDM/XPath/Xquery/XSLT faster when written with it in mind. But all processing is generally faster with the EXI because string compare is just that much slower
Ge0rG
I'm sure there is nothing wrong with executing attacker-provided XSLT on your XMPP server.
arc
I never said processing XSLT provided by the user.
arc
XSLT is a turing complete language. You can open a similar glaring security hole by running user provided python, or most languages for that matter.
Zash
(XEP-0060 has references to XSLT 😱️)
Andrzej
also (as it was mentioned that I would like to "encoding the limits of google and apple services into my server"), I think that transformation and encoding is OK and would not impose limits on your server and if client would pass "limit" of the payload which it can receive, then server would just respect request from the client
stphas left
stphas joined
Zash
So, size constraint. How about an ordered-by-priority list of payloads you're interested in? Server strips anything not in that list, then if it's still too large, strips the lowest priority payloads until it's small enough?
Zash
Where "list of payloads" could be xpath or just namespaces or somesuch, details.
Andrzej
that "could" work
Adihas left
krauqhas left
arc
Andrzej: this is a simplistic but valid way of thinking about EXI; within the root of a XMPP stream you typically only find about 6-10 elements; obviously your IQ, MESSAGE, and PRESENCE elements (capitalized only for clarity), stream: namespace stuff, and possibly session management.
krauqhas joined
arc
Text domain XML would have the xml parser run multiple passes over the data stream to find delimiters (space, brackets, quotes, etc), break the stream up into events, possibly create a minidom out of those events, and then pass qnames (namespaces, prefixes, element names, attribute names, etc) to the application typically as string pointers into the stream buffer. But increasingly for memory safety, memcpy all those strings into new buffers too.
arc
And then no matter how the client or server handles stanza routing, it either ends up testing the qnames against a large list of qnames it's designed to pass to various functions, or it hashes the qnames and checks for a value on a hashmap. In every case there's thousands of machine instructions for every stanza.
arc
Going back to those, lets say always 15 or less possible elements found in a <stream:stream> root, EXI represents that as a binary number which typically is either bitpacked with only four bits used, or structured for later compression. In every case, you don't get "message", you just get the binary number 1. Which is much much simpler to parse.
arc
Ignoring the stuff you can read up on if you ever wanted to implement it, that is EXI in a nutshell. And that is why it is not just 25x or more faster than text XML processing, it also takes far less memory and can be much more easily squeezed onto a microcontroller.
Andrzej
ok, I get it now, thanks
Kev
Has anyone tried doing EXI for S2S?
arc
Yes.
x51has joined
Andrzejhas left
Kev
Hmm. Going from https://www.w3.org/XML/EXI/#Efficient_XML I find only one (commercial) EXI implementation for C/C++ that doesn’t claim to be alpha quality.
Zash
Someone did mention a lack of libraries earlier.
Kev
Indeed.
andrey.ghas left
arc
Transforming stanzas between exi grammars is not difficult provided your grammar mapping is set up correctly
Kev
arc: Was that responding to me? Because if so I don’t see how it relates :)✎
Kev
arc: Was that responding to my lack of library comment? Because if so I don’t see how it relates :) ✏
stpeterhas joined
stpeterhas left
pasdesushihas joined
arc
Using it for s2s, you're typically going to be routing stanzas between grammars. So in the above example, a xmpp client might say a message=1, but you may route that stanza to a server using a grammar that says message=0
Kev
Ah, my previous question, gottit, thanks.
pasdesushihas left
pasdesushihas joined
arc
With text domain xmpp you often don't have to modify the stanza. And when you do, only in small specific ways like changing the from= attribute. But with EXI S2S is also grammar conversion.
arc
But going back to the xpath discussion, XDM 3.0 was certainly designed with EXI in mind though it is not referenced directly. EXI does not solve any problems in this area, it only makes things faster and with less bandwidth.
Kev
After a few minutes of looking, I’ve come to the conclusion that there’s no implementation for C/C++ that doesn’t claim to be alpha/seem abandoned other than one commercial Windows-only one. And, as you asserted earlier, I don’t want to be working on an EXI parser :)
krauqhas left
arc
There are more, and yes they are commercial. Thankfully no one has released a free one that is ready to use or I would start having trouble paying my rent.
winfriedhas joined
pasdesushihas left
krauqhas joined
arc
Anyways all that is simply different problem. It doesn't help with identifying stanzas for routing to external software, whether you want to call that microservices or not.
Ge0rG
Re EXI I think we could have significant benefits on mobile / low bandwidth, if a client implementation creates a grammar of everything supported by the client, uploads it to the server in some secure way, and the server only ever uses elements within that grammar to the client, stripping everything unknown and dropping empty elements.
Ge0rG
Your client doesn't support CSNs? It's omitted from the grammar, server strips out the element from messages, empty messages get dropped on the server, battery wins
pasdesushihas joined
pasdesushihas left
pasdesushihas joined
xeckshas left
jonas’
except only proprietary implementations, so no chance there
xeckshas joined
emushas left
moparisthebesthas left
pasdesushihas left
pasdesushihas joined
derdanielhas left
derdanielhas joined
pasdesushihas left
pasdesushihas joined
Andrzejhas joined
Mikaelahas joined
pasdesushihas left
pasdesushihas joined
govanifyhas left
govanifyhas joined
pasdesushihas left
pasdesushihas joined
alameyohas joined
Ge0rG
jonas’: implementations of EXI?
Ge0rG
How hard can it be?
jonas’
how hard can a binary xml parser be?
stpeterhas joined
stpeterhas left
Steve Killehas joined
Steve Killehas left
Steve Killehas joined
Kev
I mean, you just take an XML parser and you change some strings to numbers, right? Job done ;)
it's not-fun, so let's call it NunXMPP. Will also make clear that it's neither fun nor sexxy.
alameyohas left
moparisthebesthas joined
LNJhas joined
pasdesushihas left
intosihas joined
florettahas left
antranigvhas left
stpeterhas joined
stpeterhas left
derdanielhas left
derdanielhas joined
pasdesushihas joined
derdanielhas left
derdanielhas joined
antranigvhas joined
derdanielhas left
derdanielhas joined
edhelashas left
edhelashas joined
intosihas left
derdanielhas left
derdanielhas joined
pasdesushihas left
pasdesushihas joined
pasdesushihas left
pasdesushihas joined
intosihas joined
derdanielhas left
derdanielhas joined
florettahas joined
Steve Killehas left
Steve Killehas joined
wladmishas left
nycohas left
wladmishas joined
pasdesushihas left
pasdesushihas joined
wladmishas left
wladmishas joined
Steve Kille
f
Steve Kille
f
Steve Kille
ignore me
Zash
"push 'f' to pay respects"?
wladmishas left
wladmishas joined
wladmishas left
wladmishas joined
stpeterhas joined
stpeterhas left
papatutuwawahas joined
florettahas left
krauqhas left
nycohas joined
derdanielhas left
derdanielhas joined
krauqhas joined
papatutuwawahas left
wladmishas left
papatutuwawahas joined
florettahas joined
derdanielhas left
derdanielhas joined
alameyohas joined
Wojtekhas left
Andrzejhas left
pasdesushihas left
pasdesushihas joined
papatutuwawahas left
stpeterhas joined
stpeterhas left
Andrzejhas joined
wladmishas joined
Wojtekhas joined
lskdjfhas joined
wladmishas left
wladmishas joined
emushas joined
wladmishas left
derdanielhas left
derdanielhas joined
stpeterhas joined
stpeterhas left
pasdesushihas left
pasdesushihas joined
wladmishas joined
derdanielhas left
derdanielhas joined
pasdesushihas left
pasdesushihas joined
Danielhas left
pasdesushihas left
pasdesushihas joined
pasdesushihas left
derdanielhas left
pasdesushihas joined
derdanielhas joined
adiaholichas left
adiaholichas joined
chronosx88has left
chronosx88has joined
chronosx88has left
chronosx88has joined
serge90has left
wladmishas left
LNJhas left
serge90has joined
andyhas left
LNJhas joined
winfriedhas left
wladmishas joined
Adihas left
stpeterhas joined
stpeterhas left
wladmishas left
pasdesushihas left
pasdesushihas joined
Adihas joined
Danielhas joined
wladmishas joined
nycohas left
wladmishas left
wladmishas joined
nycohas joined
LNJhas left
LNJhas joined
LNJhas left
LNJhas joined
archas left
archas joined
matkorhas left
matkorhas joined
archas left
archas joined
archas left
archas joined
alameyohas left
alameyohas joined
Danielhas left
nycohas left
pasdesushihas left
pasdesushihas joined
matkorhas left
pasdesushihas left
pasdesushihas joined
matkorhas joined
alameyohas left
pasdesushihas left
pasdesushihas joined
nycohas joined
marekhas left
pasdesushihas left
pasdesushihas joined
derdanielhas left
derdanielhas joined
marekhas joined
derdanielhas left
derdanielhas joined
arc
You do not need to even parse. All you really need is some basic data structures. EXI is extremely easy to implement a reader and writer for.
neshtaxmpphas left
arc
The difficult part is generating grammar based on schema, then generating code based on grammar. And you'll see that for most cases, this is the part that is not provided
debaclehas left
arc
I have seen some foss implementations that work fine, but require that you provide a grammar file. Or provide all the low level data structures and the header, but leave the application api to build your own EXI decoder/encoder.
moparisthebest
what are the upsides? just that it's ever-so-slightly smaller?
jonas’
moparisthebest, savings by EXI are more than just "ever so slightly". especially when base64’d stuff comes into play. my understanding is that you could transfer that in decoded form in EXI
moparisthebest
which saves what, 33% only in the case of base64'd stuff?
jonas’
that’s considerable on GPRS
jonas’
and remember that all of OMEMO is base64’d
moparisthebest
GPRS is dead though so who cares
jonas’
you wish
derdanielhas left
moparisthebest
actually I wish it wasn't, but that's beside the point :)
moparisthebest
in the USA 2G has been gone awhile, most carriers dropped 3G last month, the rest will next year
neshtaxmpphas joined
moparisthebest
is all EXI offers "better compression" then ? and if so, how does it compare to "just compressing" XML, and does it suffer from the same security vulnerabilities "just compressing" XML does ?
SamWhited
moparisthebest: that doesn't mean 2G is dead, that means the major carriers dropped it and if you were in an area where it no longer exists you just don't have internet anymore unless you switch to a smaller local provider or use a slow wireless uplink. That's *more* of a reason to save bandwidth, not less.
chronosx88has left
chronosx88has joined
SamWhited
I make no comment on EXI in particular, just complaining because I'm always annoyed when I've lived somewhere where it's impossible to get fast internet and people say "what's the problem, everywhere has fast broadband now!"
moparisthebest
I think we might have discussed this before, and I think the conclusion was EXI is probably as vulnerable to CRIME/BREACH as gzip is ?
moparisthebest
SamWhited, yea I think it sucks, but it doesn't change the fact that in the USA (and soon everywhere else presumably), it's dead
moparisthebest
I think next year for canada iirc
SamWhited
I don't thik that's true though, it's just dead if you use AT&T or T-Mobile, otherwise I believe smaller providers are still using it because it's what they can afford to do easily. Even if it is dead, that doesn't mean "we don't need to worry about a 33% bandwidth savings" because it being dead doesn't mean everyone gets upgraded to 4G
alameyohas joined
nycohas left
LNJhas left
moparisthebest
right, I'm not saying EXI is good or bad, I'm asking what advantages it offers, how it compares to just running normal compression on XML etc
neshtaxmpphas left
SamWhited
Maybe I misunderstood "which saves what, 33% only in the case of base64'd stuff? / GPRS is dead though so who cares"
Zash
I vaguely recall all of Sweden having some kind of GSM coverage at some point, but since recent-G networks, not so much. Not much profits in covering the Scandinavian mountain range in high-speed internet.
SamWhited
That sounds like "we don't have to save bandwidth, everyone has fast internet now"
Zash
AIUI you get rid of the parser and shuffle packed structs over the wire, instead of a text format that requires a proper parser.
marekhas left
moparisthebest
that sounds like a proper security nightmare to me
marekhas joined
Zash
How's HTTP/2 doing? How's all the ... binary Google format I don't remember the name of anymore.
moparisthebest
so for sure EXI has downsides compared to compressed XML, like comparatively few libraries, no/much less security auditing, and such, does it have upsides?
Danielhas joined
andyhas joined
Zash
Compression of the zlib variety has memory usage issues and security .. ickyness.
how sure is anyone EXI doesn't have the same security problems ?
Zash
I'm sure it's just as good as ASN.1 & co! 😛
LNJhas joined
Zash
moparisthebest, protocol buffers, was the thing I forgot the name of. I mentally included HPACK in "HTTP/2"
SamWhited
I don't remember much about EXI, but it negotiates the cmopression up front instead of constructing a dictionary that is reused across stanzas right? In that case I don't see how it could be possible for it to have the same issues as stream compression or TLS compression
moparisthebest
even HPACK acknowledges it's vulnerable to CRIME and friends https://tools.ietf.org/html/rfc7541#section-7.1.1
Zash
Can't you ... not do that with EXI?
Zash
Just do the binary packing stuff
Zash
CRIME & co comes from compressing user/attacker data in the same context as other stuff, so if you just don't do that, things should be better
moparisthebest
SamWhited, I realize current-xmpp-compression doesn't do this, but nothing stops you from designing a XEP such that each stanza is compressed separately right?
stpeterhas joined
stpeterhas left
Zash
moparisthebest: Correct, just need to write it down somewhere and get implementations to comply.
SamWhited
moparisthebest: sure, I've worked plces that implemented it that way in the pas
SamWhited
t
moparisthebest
so, that's *not* vulnerable to CRIME then ?
Steve Killehas left
moparisthebest
then the questions are: 1. is EXI vulnerable to CRIME ? 2. if not, how does it compare size-savings-wise to just doing the above with XML ?
jonas’
I did some measurements on the impact of compressing stanzas individually some time ago, and it was "still worth it"
SamWhited
Right. Assuming you're not doing something weird with your stanzas where you're mixing secret data and non-secret data, but in the general sense that's right
jonas’
I’m not sure if EXI contains any compression at all
Zash
moparisthebest, doing a full flush between each stanza? yes, that fixes CRIME (if I remember correctly what CRIME was about)
jonas’
or if you have to use it
moparisthebest
in theory it should get rid of basically all base64 overhead right?
neshtaxmpphas joined
Zash
IIRC you only need to do a full flush when the sender changes.
Steve Killehas joined
alameyohas left
LNJhas left
Zash
Might be interesting things you could do with CSI integration
neshtaxmpphas left
SamWhited
We just did a full flush on stanza boundaries at HipChat. Probably could have saved more by being clever, but it was easy and it helped a lot. Dropped our network traffic by a factor of 0.58 and dropped CPU utilization by a factor of 0.60
LNJhas joined
SamWhited
(with ZLIB)
Zash
A big part of why we dropped compression completely from Prosody was memory usage tho.
moparisthebest
might even be better options today, it'd be interesting to see XMPP-wise how zstd and brotli compared
nycohas joined
SamWhited
Yah, I didn't write down or publish our memory usage, but I assume it went up a bit. Don't remember it though, had to go look up those two numbers. It's been a while.
chronosx88has left
chronosx88has joined
Zash
I'd be interested in fixed-dictionary compression
moparisthebest
iirc that's what EXI is doing ^ ?
moparisthebest
except it negotiates the dictionary, roughly
Kev
That’s more or less what EXI is isn’t it?
Kev
Heh, beaten :)
Zash
I though it was more like bit packing structs, not like zlib & co
moparisthebest
but that implies doing the same compression across all stanzas in a stream, which probably implies CRIME
Zash
Not if you don't allow backreferences into user data, only into the dictionary
moparisthebest
or possibly some other attack if you know the dictionary used, idk
moparisthebest
isn't "xml element names used in the stanza" sometimes "user data" ?
Zash
You don't learn anything about the previous stanza sent by someone else
Zash
If you say "hello", then an attacker saying "hello" right after would be smaller because it can reference that previous "hello"
Zash
But if you build a static dictionary, that everyone involved already know, you don't leak user data from that.
Zash
Static dictionary would be something full of angle brackets and protocol stuff, no user data.
moparisthebest
I don't know, maybe? https://tools.ietf.org/html/rfc7541#section-7.1.2
nycohas left
SamWhited
I've thoguht about doing that with zstd a couple of times. It has a training mode where you can give it a sample set and it builds a dictionary from that you can reuse later. Never actually tried to see how well it does with a big set of XML though.
nycohas joined
pasdesushihas left
pasdesushihas joined
winfriedhas joined
Andrzejhas left
serge90has left
benharrihas left
benharrihas joined
serge90has joined
paulhas left
Wojtekhas left
derdanielhas joined
paulhas joined
derdanielhas left
derdanielhas joined
antranigvhas left
derdanielhas left
derdanielhas joined
Adihas left
Adihas joined
Andrzejhas joined
antranigvhas joined
wladmishas left
archas left
archas joined
arc
moparisthebest: is more than a little compression. With xmpp it can transform the overhead of a <message> stanza from around 100-200 bytes (depending on jid length) to around 10.
moparisthebest
arc, so is it vulnerable to CRIME, and how does that compare to simply compressing XML stanzas
arc
Where did you get that idea?
moparisthebest
that was a question, not a statement :) "is it"
arc
I don't believe so, no. Because it is not compression. It would use the same number of bits regardless of the length of the qnames, jids, etc.
derdanielhas left
derdanielhas joined
arc
EXI is a binary representation of XML. One of the functions is designed for is to then run it through a conventional compression such as DEFLATE
derdanielhas left
derdanielhas joined
derdanielhas left
derdanielhas joined
arc
Another option is to bitpack.
derdanielhas left
derdanielhas joined
moparisthebest
is *that* vulnerable to CRIME
arc
https://www.w3.org/TR/exi-primer/ goes into the higher level nitty gritty if you want to read more.
arc
No because it is constant width.
derdanielhas left
derdanielhas joined
derdanielhas left
derdanielhas joined
pasdesushihas left
pasdesushihas joined
derdanielhas left
derdanielhas joined
arc
The most you could probably learn from CRIME is encoding type. EXI supports four modes; byte packed, bit packed, precompressed, or compressed.
derdanielhas left
derdanielhas joined
arc
And the letter two are basically the same except the compression is deflate added on top of precompressed
arc
https://www.w3.org/TR/exi-primer/#compression explains how data in the stream is rearranged for better compression
moparisthebest
so how does it do on a super simple message stanza like https://paste.rs/Oe6.xml ?
brotli does surprisingly well here (no settings touched, just `brotli -c msg.xml > msg.xml.brot`)
arc
But you can basically ignore everything inside the XML brackets
moparisthebest
not the attribute values?
arc
Everything inside the XML brackets
moparisthebest
so, run some common examples maybe?
arc
There is an example in the primer
moparisthebest
of what, say a normal chat client might use for a mode/grammar/whatever
Danielhas left
arc
https://www.w3.org/TR/exi/#stringTable will take care of the jids
arc
<message becomes one byte (except in bit packed), SE-MESSAGE
moparisthebest
> The life cycle of a string table spans the processing of a single EXI stream.
moparisthebest
so that likely makes it vulnerable to CRIME then ?
moparisthebest
in fact, it certainly would right?
moparisthebest
if you put JIDs in there anyhow
arc
from= and to= are each AT, so one byte each since they are optional. The values come from the string value table
marekhas left
moparisthebest
except they aren't because you can't do that and avoid CRIME
derdanielhas left
derdanielhas joined
moparisthebest
so add them back in
marekhas joined
arc
What is your obsession with CRIME? The string tables are a constant value. They are not susceptible
moparisthebest
because you can't just go "maybe it's not vulnerable to breaking the stream encryption ¯\_(ツ)_/¯"
derdanielhas left
derdanielhas joined
moparisthebest
it either is, or it is not
arc
The only thing you're going to get for information leak from the string table is the number of values in the table.
moparisthebest
attackers can manipulate the JIDs that go across the stream right ?
Tobiashas left
Tobiashas joined
arc
How?
moparisthebest
how would any JID except the sending one be in the dictionary up front anyhow ?
Danielhas joined
moparisthebest
how will you know randomjid@randomdomain is going to message you?
moparisthebest
maybe EXI is only useful in closed deployments where you know the entire network up front ?
moparisthebest
(and, maybe, don't care about TLS providing CRIME-proof security, still unknown on this one)
arc
You're right the length of jid could be leaked.
moparisthebest
so if you cut that stanza down to essentially the bare minimum info that must be transmitted in it, and it's probably *too* cut down actually, you end up with something like https://paste.rs/aPt which is 86 bytes
mathijshas left
moparisthebest
that means EXI would have to fall somewhere between 86 and brotli's 101 to be useful at all
moparisthebest
that's a small range for improvement
Daniel
i think that's a very simplistic example
Daniel
look at an omemo pre key bundle for example
Daniel
or generally anything with lots of nasted elements
moparisthebest
sure, I suspect brotli would improve even more with anything base64'd though
Daniel
plus 184 requests 333 requests
Daniel
and all the other stuff we regulary put into messages
Daniel
and it should be easier on the cpu
moparisthebest
honestly regular compression's ratio should get better the bigger the stanza is right? so a small one like this is probably least fair
Daniel
i have a deployment where we would like to use compression but can’t because it's too expensive
pasdesushihas left
pasdesushihas joined
Daniel
and if implemented correctly exi can even be faster than string parsing
Daniel
where as compression is always slower
arc
You can add brotli on top of EXI. It works very well for that purpose.
moparisthebest
I doubt that's always true (that compression is slower)
SamWhited
Daniel: are you sure about that? We had CPU usage drop on one machine because there were less TLS packets where most of the CPU was being taken up
arc
But again this is all based on the grammar.
Daniel
SamWhited, no
SamWhited
Worth measuring anyways if you haven't. If you're using a slow TLS cipher and a fast compression algorithm you might make some gains
moparisthebest
I understand EXI should be able to be better than general compression in a closed system where you know every stanza that will ever be passed
moparisthebest
the question is, can EXI be better than general compression for a chat client in the public federated XMPP network
larma
any compression system that works on reusing user input is has issues the like of CRIME and similar. EXI could be fine if no string lookup table was used. However schema-based EXI is complicated and schema-less EXI is far from optimal✎
larma
any compression system that works by reusing user input has issues the like of CRIME and similar. EXI could be fine if no string lookup table was used. However schema-based EXI is complicated and schema-less EXI is far from optimal ✏
ti_gj06has left
SamWhited
I suspect the question is actually "assuming that EXI and GZIP flushing on stanza boundaries are both 'good enough', which one is easier to implement and deploy widely" (and the answer is probably normal stream compression), but of course I may be wrong.
moparisthebest
yes roughly SamWhited , I think it's rather obvious general compression and regular XML is easier no ?
larma
somthing like exi would still be more efficient than gzip. also why would you think flushing on stanza boundaries is enough?
moparisthebest
so it becomes 1. is EXI secure 2. is it better than that, and if so, better enough to justify the effort?
Zash
Could do something in the direction of HPACK and/or WhatsApps FunXMPP compression scheme...
SamWhited
moparisthebest: I suspect so, yes
arc
moparisthebest: again, EXI is not compression. It is a schema-aware binary representation of XML data. It is intended that you use compression on top of it.
arc
When you're not using compression on top of it, typically you use bit packed mode.
adiaholichas left
adiaholichas joined
moparisthebest
then you'd have to show EXI+compression is better in terms of size+effort vs XML+compression ?
moparisthebest
and additionally, ensure EXI doesn't introduce security bugs like CRIME
flow
larma> somthing like exi would still be more efficient than gzip. also why would you think flushing on stanza boundaries is enough?
honest question: why would you think that it is not enough?
arc
Actually as I'm thinking about it, it would make a lot of sense to pre-populate the values string table with the JIDs in your roster. And if you are worried about CRIME, you can also specify that padding is used to make all jids transfer as a fixed width
larma
flow, attackers can control certain content of stanzas even if it's not them sending
moparisthebest
so then you can use than as an oracle to determine a user's roster
arc
How?
flow
larma, unfortunately I am appearantly missing pieces why this is relevant
larma
ah maybe there is something that I have in mind that nobody else has. If we ever do some kind of efficient compression on stanzas, I'd want to also make use of it inside SCE. Then it's no longer transport encryption only, but also about end-to-end-encryption relevant. And if you assume the server as an attacker, they can easily modify IDs and similar to modify certain parts of your message to exfiltrate others.✎
larma
ah maybe there is something that I have in mind that nobody else has. If we ever do some kind of efficient compression on stanzas, I'd want to also make use of it inside SCE. Then it's no longer transport encryption only, but also about end-to-end-encryption relevant. And if you assume the server as an attacker, they can easily modify IDs and similar to modify certain parts of your encrypted message to exfiltrate others. ✏
flow
larma, thanks, I probably need to think about this a little more (and with more sleep)
SamWhited
I still think that doesn't seem worth considering, the server has to be trusted, that's the whole model that we have. Trying to change that and doing complicated partial stanza encryption just seems like a waste of time. Let's just settle on compression that's "good enough" and in the occasional system where there's some high-security environment where the server isn't trusted, don't use it.
Zash
Did anyone define "good enough"?
flow
larma, but it sounds like you are talking about an intermediate hop modifyin encrypted bytes, that shouldn't be possible, no?
SamWhited
Zash: "better than no compression?"
arc
I have to agree with Sam
arc
Here's the thing; XMPP is low enough bandwidth and no one really cares except IoT, and maybe mobile.
emus
(Different topic: Kev - there was a misunderstanding with Sam W., nevermind about the Twitter thing - I will just request the pinning via mail 😊️ )
larma
flow, no, certain parts are unencrypted, like message id. Also you very likely would want to be able to reply to unencrypted messages with encrypted messages without risking to actually leak the content of the encrypted message 😉
SamWhited
Yah, sorry, I didn't realize Twitter required me to have a separate personal account
SamWhited
I'm still happy to help on the comms team as necessary though, someone just tell me what needs doing and I'll see what I can do.
flow
larma, ahh I think I got it now: values that are to-be encrypted are determined by the server
arc
On IoT these days we mostly care about making xml easy enough to use that we can fit it on a microcontroller and utilize less cpu and transmit size so that the battery lasts longer
flow
that is certainly interesting. I never considered compressing the encrypted bytes prior base64 encoding them and the implications of doing so
emus
SamWhited if you read in the CommTeam MUC occasional or subscribe to the PR that should be fine. If you want, I can point you as reviewer there, too
arc
EXI seems extremely complicated, but CPUs can process it with a minimum amount of code. A typical EXI compiled binary static library is around 8k.
flow
FWIW, I believe EXI would also be able to avoid base64 for raw bytes (at least, that is what I remember being told ~5 years ago)
SamWhited
emus: can you invite me? I don't see the comms team muc listed on the website
emus
https://xmpp.org/newsletter.html here is a link if thats okay for you, I dont have you in my contact list so I guess thats quicker
SamWhited
thanks
Andrzejhas left
arc
And when you're dealing with a microcontroller costing under $1, that only has like 16k-64k flash for all the software, the size of the software matters a lot. And when the manufacturer is budgeting the mAh battery, they care a lot about keeping the microcontroller in sleep and moreso keeping the transmitter powered down as much as possible when the cpu is awake.
inkyhas joined
arc
Because in the end, they don't care about anything else. They don't care about standards compliance. They don't care what technical solution is used. They don't care about CRIME or data security.
arc
They're making 10 million of these units, at minimum, so if they can save 10 cents that's a million dollar profit.
arc
They look at a big stack of money on one side, and a technician whining about standards compliance or user security on the other, they will choose the big stack of money every time.
arc
The only reason *some* of these manufacturers are using xml at all, and a very small number overall, is because exi can do the work with less code, less cpu, and less transmit power than message queues, which is what they would be using otherwise
neshtaxmpphas joined
moparisthebest
right, I get why EXI is a win for bottom-of-the-barrel iot, still not sure if it's a win or not for generic federated xmpp chat client though
arc
Nobody is using it for general federated xmpp
moparisthebest
yes but I'm wondering if it's worth pursing it for that though
Zash
moparisthebest, ~1MB chunks of XML sent to my phone something something would be nice if it was smaller
moparisthebest
but maybe regular compression solves it for you, and that's easy
arc
Half the time they just want http anyway. Because the software is free and HTTP is lightweight. And because their in-house developers already know it.
Zash
but then most of that consists of people sending presence to all the same MUCs that I'm in
arc
Zash, I agree with you. And it would likely turn the 1MB chunk into 1kb.
arc
String tables are surprisingly efficient at jids
Zash
FWIW this is with CSI buffering up unimportant stuff into larger chunks, that's how it gets so large.
moparisthebest
arc, but "string tables" can't be used on public federated XMPP because they are vulnerable to CRIME, so they are out
fuanahas joined
arc
I would say that you choose not to use it because you believe is impossible to implement with your security criteria in mind
Zash
the real question is, is CRIME really that bad?
arc
And if I really cared I would have worked on it a long time ago
SamWhited
yah, honestly, after SASL is complete I'm not sure how much I care about CRIME-like vulnerabilities.
Zash
IIRC you almost need to launch a DoS attack to get anything out of it, maybe just sprinkling some rate limits on top makes it all Good Enough
arc
I am certain that there is an orchard full of hanging fruit for xmpp security, nobody will go through the effort
jonas’
CRIME also gets less and less relevant with e2ee
jonas’
(post-SASL that is)
Zash
And good luck doing CRIME with SCRAM anyways
arc
I mean, all the software that uses libxml2 should just be considered insecure by that very nature
moparisthebest
makes me pretty nervous to pretend CRIME isn't a threat... attacks only get better, never worse
arc
Google was so certain of the fundamental insecurity of anything XML that they never even implemented encrypted S2S
Zash
Huh. Source?
moparisthebest
and actually I keep saying CRIME when BREACH is actually a better fit for what we are talking about https://en.wikipedia.org/wiki/BREACH
arc
Zash, source is the number of times I've had beer with Google developers
Zash
moparisthebest, C-ompression something something is easy to remember tho
fuanahas left
fuanahas joined
arc
In 2019 I had a drink with a few of the guys that previously worked on the gtalk/hangouts sre team, and I commented that it probably was a lot easier doing seculity with Go and it's brand new xml library. One of them shot his drink out his nose
arc
They apparently don't use XML internally anymore because of it
moparisthebest
well, google is well known for their poor decisions so :)
moparisthebest
imagine taking advice from a company that created and killed 83 chat systems in the last 15 years
arc
But that's the thing, not once did they create one that actually works the way they wanted it to
arc
And anything even remotely stable is just piled on mess over mess until they have to start over
SamWhited
They are spot on with that one though, XML is absolute garbage in terms of being able to do a secure implementation. Way to big of an attack surface for something that should just be a way to transfer a tree…
SamWhited
Not that Google's dev practices or product management practices are always great, but they know their security.
StefanK$has joined
arc
Protobuffs is wide open to attack too. They work around that with things like ucs4 and fixed length string entries
SamWhited
I'm not saying protobuf is perfect, I don't know much about them, just that they're not wrong about XML being bad.
moparisthebest
XML is the worst except for everything else
SamWhited
No, it's just worse than most other things. We're stuck with it for the base of XMPP, and I don't see anything else that works in a similar enough way to do a streaming protocol in as nice of a manner, but that doesn't mean it's not insecure garbage or that we shouldn't be *very* careful with it
arc
XML is not bad by design. Is bad because nobody cares.
DebXWoodyhas joined
fuanahas left
Yagizahas left
SamWhited
I don't think that's true. Designs like hacking namespaces on and mixing them with attributes, adding things like proc insts, etc. are just bad.
arc
I'm not saying that's not bad. I'm saying the XML suffers the same type of abandonment that led to heartbleed
papatutuwawahas joined
SamWhited
I mean, I'm sure nobody caring is a problem too, but I'd also say XML is just bad by design and therefore we need to be *very* careful about its use and not dismiss other peoples concerns about it so redily.
Andrzejhas joined
SamWhited
I just get mad every time somebody does the "it's just <> vs {}, why is HN so trendy?" or whatever in this room when realistically whatever they're complaining about is probably a serious problem that we should be addressing instead of being dismissive.
moparisthebest
but the "json is better than xml" people don't have any answers for all the ways in which json is worse
moparisthebest
and you can s/json/anything/ there too
Zash
My langsec friend said something some time that I remember as "xml is okay, it's not made of length-prefixes and stuff"
SamWhited
Sure, but that's not the problem, the problem is that we pretend that means XML is good somehow and then Go has the issue where namespaces can be manipulated and it's like the third time I've seen that in an XML decoder and somehow we just say "no, XML is fine, let's use more of it" every damn time
Kev
The fundamental problem is that XML got so much stuff *right* that we’re stuck with the stuff it didn’t.
moparisthebest
"bad libraries exist" is a thing
SamWhited
"Every library is consistently bad in the same ways" is the actual problem.
arc
Oh I'm not dismissing anything. I'm just worn down by an industry that doesn't care until it nearly destroys them. I lead the Python xml-sig that produces code responsible for billion dollar industries. Those companies don't care. If the problem were made bluntly clear to them, they wouldn't fund anything. They would direct their technical teams to migrate to another language
moparisthebest
binary formats never have parsing vulnerabilities https://duckduckgo.com/?q=asn.1+parsing+vulnerabilities
SamWhited
But I dunno, I just think XML is garbage in general outside of the security realm too and got next to nothing right. I accept it's the only thing we could use for a system like XMPP, but IMO we should literally never use more of it ever regardless of whether it's one of the parts that has consistent security issues or not
Zash
asn1, the xml before xml, let's go back to it!
Zash
If only for the CRITICAL bit
mathijshas joined
Zash
SamWhited, face it, everything we do is garbage held together with duct tape. That we haven't nuked/pandemic'd/burned ourselves into extinction yet is quite amazing. Probably because we invented duct tape 😀
SamWhited
you're not wrong
arc
That and because we are hanging just above the low hanging fruit. Like vulnerabilities in cryptocurrency exchanges that allow people to steal millions of dollars in untraceable funds
StefanK$has left
Zash
At least it's our garbage. Our own! Our ... precious!
Zashpets pile of angle brackets.
arc
Sure people could attack xmpp. I imagine many of us could, knowing what we know. But there is no direct profit from doing so, and any scheme to profit from such an attack would be too complicated to pull off
moparisthebest
I think this can be summarized as "computers are bad, formats don't matter" https://duckduckgo.com/?q=json+parsing+vulnerabilities https://duckduckgo.com/?q=xml+parsing+vulnerabilities
DebXWoodyhas left
DebXWoodyhas joined
Zash
https://xkcd.com/2030/ comes to mind
SamWhited
That's exactly the problem, "our entire field is bad" doesn't mean "so we should give up and use whatever we want and ignore the problems with specific formats because another one might also have problems or there might be an issue at some other layer of the stack"
moparisthebest
but, switching formats doesn't address the problem, you should just address the problem instead
Zash
"our entire field" has existed for like half a century and usually doesn't directly kill people ... unless you count artillery trajectory calculations ... oh no
fuanahas joined
SamWhited
I didn't say we should switch formats, I said XML is what we've got and it's probably the only thing that works for XMPP but we shouldn't be dismissive when people point out problems with it and we shouldn't add more of it.
Zash
SamWhited, and we shouldn't do what everyone else does and wrap it in JSON!
fuanahas left
fuanahas joined
Zash
Oh wait, oh no, https://xmpp.org/extensions/xep-0295.html
SamWhited
I mean, yah, we should definitely not implement XEP-02395 :)
SamWhited
err, 0295
APachhas left
APachhas joined
arc
No Sam. We should fix it. Here I'll put it on my giant stack of things I care about that will never earn a cent, will not improve my chances of getting hired, would take the rest of all of our lifetimes, and leave us living on the street.
fuanahas left
Kev
Although 295 was obviously a joke, I note that the first suggested encoding pretty much works, and actually avoids some of the security issues of XML ;)
Zash
The one that looks a bit like JSON-LD?
arc
I do really care about these things. I'm not completely jaded. I think a lot of people really care about these things. I'm pointing out that we need to solve some lower-level fundamental problems first.
Kev
Zash? Does it? Yes, anyway. The one that isn’t entirely stupid.
Kev
When I wrote it I didn’t think of it in terms of having benefit over the XML representation, but it *does* eliminate whole classes of security issues.