XSF Discussion - 2018-11-02


  1. flow

    jonas’, which server side impl does the full flush?

  2. flow

    ahh it is "bytes saved", that is why full flush percentages are lower…

  3. jonas’

    flow, I hacked the prosody impl to do a full flush

  4. Ge0rG

    ain't it called a "straight flush"? 😁

  5. Ge0rG

    how do you add images to the xmpp wiki?

  6. pep.

    There's a special page to upload file on mediawiki? That's usually accessible on the panel on the left when editing a page

  7. Ge0rG

    pep.: yes, that's about what I know. But that link isn't there, so now I'm lost.

  8. pep.

    /Special:Upload

  9. pep.

    If not maybe that's disabled?

  10. ralphm

    It is disabled

  11. Ge0rG

    And I presume you can't inline-link externally hosted files?

  12. flow

    jonas’, good job. that sounds like it is possible to make prosody announce a zlib-with-full-flush-or-whatever-you-wanna-call-it compression method

  13. dwd

    jonas’, Real percentages are much higher, or at least they used to be. I did a load of work compressing real traffic captures - but performing a sync flush after multiple stanzas helps a lot, and using CSI really drives it up.

  14. Ge0rG

    dwd: CSI doesn't absolve you from sync-flushing after each stanza, right?

  15. dwd

    Ge0rG, I don't think anything mandates you sync-flush after every stanza - just after every buffer flush.

  16. flow

    Ge0rG, you only need sync flush if there is no more data

  17. dwd

    flow, That's more or less what I was typing - you only flush once all the inbound traffic has been processed, at least on C2S.

  18. flow

    dwd, sorry, I just didn't get what you meant with "after every buffer flush" and wanted to clarify it a bit for Ge0rG

  19. dwd

    flow, Yeah, I was clarifying it the same way as you, but you beat me to it. :-)

  20. flow

    dwd, I'm also confused why you wrote "inbound" traffic, I'd say it is the "outbound" traffic where an entity controls the zlib behavior. For inbound traffic it is just consuming whatever bytes have been send to it

  21. dwd

    flow, Ah. So if a client sends you N stanzas, you only need to flush after processing all N.

  22. Ge0rG

    dwd: the issue we are working around is that compression provides a plaintext size oracle for attackers, right?

  23. flow

    dwd, that again sounds like the receiving entity would flush

  24. flow

    which is kind of new to me

  25. dwd

    Ge0rG, Sure, if you think that's a realistic security problem, then you have to compress only traffic that can be influenced by one entity at a time. Which basically means compressing each stanza individually.

  26. flow

    dwd, or "do a full flush on every channel change"

  27. Ge0rG

    dwd: all security problems tend to become realistic sooner or later.

  28. dwd

    flow, It is exactly that. But it only makes a difference on C2S, I hasten to add.

  29. flow

    dwd: hmm, well as long as the from/to pair does not change on s2s…

  30. dwd

    Ge0rG, Sure. But the compression oracle in HTTP was significant because it allowed access to password data, for example.

  31. flow

    you don't need to drop the dictionary, I think

  32. Ge0rG

    dwd: are you saying that s2s is not affected by the oracle vulnerability, or that the channel stays always the same between the two server domains?

  33. flow

    Ge0rG, I think he meant that the channel changes with every stanza

  34. flow

    but I'd argue that the channel stays stable until the from/to pair changes

  35. flow

    whereas in c2s, on of from/to is always fixed

  36. flow

    *one

  37. Ge0rG

    By that logic, with CSI we should reorder messages so that same-channel messages are sent consecutively

  38. dwd

    *sigh*

  39. flow

    I don't follow how this is implied by that logic

  40. dwd

    I think that you can run a compression-oracle attack on S2S more easily - I think it's easier to inject traffic, and possibly easier to witness the transport channel as well - but you'd find it harder to get anything useful once you had the attack in place.

  41. Ge0rG

    dwd: the compression oracle in HTTP made it comparably easy to extract credentials, yes. But it does apply to content as well, just that it's rather hard for an attacker to control data injected after the typical body of a web site.

  42. Ge0rG

    with XMPP, the game is vastly different

  43. flow

    Slightly unrelated: I also wonder how widespread s2s compression is

  44. dwd

    flow, Not very. Early versions of Openfire did it, but we disabled it (because it stopped working).

  45. Ge0rG

    I don't know, but I'd argue that s2s compression is largely irrelevant in typical federated deployments

  46. dwd

    It wouldn't surprise me if M-Link did it, given its use-cases, but I don't know (and it's a looong time since I knew that kind of thing).

  47. flow

    Ge0rG, I'm not sure about the "irrelevant" part

  48. Ge0rG

    flow: irrelevant in the sense that you are not gaining much from it

  49. flow

    Not everyone hosts its XMPP server in a well connected datacenter

  50. flow

    Ge0rG, I figured so far, but I still believe that this may not be true in every case.

  51. Ge0rG

    flow: if you run your XMPP server in your basement on a crappy ADSL line, you are probably not going to use IBB transfers much

  52. flow

    Ge0rG, i was thinking more about third world countries

  53. dwd

    flow, Well, tactical military deployments are all S2S over long/thin links, but usually with heavy compression on the links themselves, so I'm not sure '138 would be needed.

  54. dwd

    Still, all this is rather irrelevant. If we posit that content ends up fully encrypted under OMEMO/MLS/OX/eSessions/PGP then it's incompressible (one hopes). The remaining traffic is best compressed by EXI.

  55. flow

    True, but then again, we are far from the point where content is fully, or even mostly, encrypted. It may take years until we reach that.

  56. flow

    So I am again not sure about the "irrlevant" part :)

  57. Ge0rG

    dwd: if the s2s connection is encrypted, you can't compress it much on the underlying link layer

  58. dwd

    Ge0rG, WHich is why they don't in those circumstances.

  59. jonas’

    dwd, right, so both the prosody and aioxmpp implementations do a sync (or full) flush after each stanza

  60. jonas’

    so by implementing some CORKing, that could be made better I suppose

  61. fffo881

    F

  62. dwd

    G

  63. flow

    jonas’, CORKing?

  64. jonas’

    flow, like TCP CORK, where you wait for more data for a short period of time before sending it out

  65. flow

    ahh, nagle algorithmus, right?

  66. MattJ

    Similar, but manual

  67. dwd

    jonas’, Yeah, that's Nagle not CORK. CORK is holding the transmission until you manually release it.

  68. jonas’

    dwd, isn’t nagle that thing which reduces the data rate when stuff gets lost?

  69. jonas’

    I am lost in the TCP termini, sorry for the confusion.

  70. flow

    jonas’, I don't think so

  71. jonas’

    you’re probably both right :)

  72. jonas’

    nevermind me, you know what I mean (now) though :)

  73. dwd

    jonas’, No, that's backoff, which might well have been developed by Jon Nagle, but doesn't bear his name at least.

  74. flow

    nagle just defers the write a widen the window for more data from the application

  75. MattJ

    jonas’, Nagle's is basically automatic corking at the beginning of a connection

  76. flow

    batch/bundle and defer

  77. jonas’

    ok

  78. flow

    now that I read up on TCP_CORK I can imagine that it isn't heavily used because it appears to be error prone

  79. dwd

    jonas’, And then there's the reverse - lose stuff when the data rate drops - which is best done with RED, which is Sally Fields's design as I recall. But I don't think that makes sense in XMPP.

  80. jonas’

    this was more about the concept anyways

  81. dwd

    flow, Very platform specific too, and irrelevant to us because we need to compress as we go, I think.

  82. flow

    MattJ, "at the beginning of a connection"? Isn't nagle used over the whole lifetime of a connection (if enabled)?

  83. MattJ

    Mmm, yeah, sorry

  84. flow

    jonas’, if the idea is to wait for more outbound stanzas until you give the network layer green light to send it, then I'm fully with you. And like to note that Smack allows for that since many years. Even though I've implemented it to reduce the powered-up time of the radio, it will also help regarding the compression ratio

  85. jonas’

    flow, no, the idea is to wait for more stanzas before performing the full/sync flush in zlib

  86. jonas’

    instead of flushing after each stanza

  87. flow

    MattJ, no worries, just wanted to make sure that I'm not missing something

  88. jonas’

    (of course taking into account the "(to, from) pair must match to be secure" criterion)

  89. flow

    jonas’, I think we are talking about the same mechanism

  90. jonas’

    good

  91. flow

    I just want to point out that it also increases efficiency in other areas

  92. jonas’

    true

  93. Ge0rG

    what about having a zlib dictionary per JID?

  94. jonas’

    memory cost

  95. jonas’

    and I think both parties need to agree on the dictionary beforehand

  96. jonas’

    so you’d have to transfer that dictionary every time you switch?

  97. jonas’

    or if you had multiple compression streams, you’d have to have an out-of-band way to signal to the peer which one the next bytes belongs to

  98. jonas’

    or if you had multiple compression streams, you’d have to have an out-of-band way to signal to the peer which one the next bytes belong to

  99. Ge0rG

    Yay.

  100. dwd

    You could build state-switching into the compression framing, of course, but yeah - memory cost would be scary-huge.

  101. jonas’

    regarding the use of compression and e2ee: zlib seems to be rather good at reversing the base64-bloat, so that’s at least something.

  102. Ge0rG

    We need a way to embed raw bytestreams into XML.

  103. Ge0rG

    Or just replace XML with... protobufs? ASN.1?

  104. jonas’

    using base92 would probably go a long way already

  105. Zash

    XER

  106. jonas’

    (or was it 96?)

  107. jonas’

    anything above that would give diminishing returns due to UTF-8 encoding anyways

  108. Ge0rG

    jonas’: base-91

  109. Ge0rG

    yeah, UTF-8 is not an efficient encoding.

  110. dwd

    Well. Not in terms of bits, anyway.

  111. jonas’

    meh

  112. jonas’

    base91 uses < and >

  113. jonas’

    an &

  114. jonas’

    and &

  115. jonas’

    while not-using -, \ and '

  116. Ge0rG

    Anybody still remembers https://en.wikipedia.org/wiki/YEnc ?

  117. jonas’

    base85 seems to be the highest thing which is specified somewhere

  118. jonas’

    base85 seems to be the highest thing which is specified somewhere sane

  119. dwd

    I do occasionally muse over whether a dedicated XMLStream compression could outperform EXI in practical ways, though. Easy to have binary blobs instead of base64, for example, and we could accrue symbols and store dictionaries of XML symbols between sessions and things. We could also ignore the problems of comments, PIs, etc. Possibly even ignore namespaced attributes, since we never (?) use them.

  120. jonas’

    don’t shut the door on namespaced attributes completely.

  121. Ge0rG

    XML is really a horrible encoding protocol for machines.

  122. flow

    what jonas’ said

  123. Zash

    It's fine, don't worry too much

  124. dwd

    jonas’, Well, it wouldn't matter if they were considered an outlier and not encoded very efficiently, at least.

  125. jonas’

    dwd, that’s true

  126. dwd

    Ge0rG, I quite like many of the properties of XML for our purposes. Certainly the alternatives would make a bunch of things much more painful - and I always have a nagging feeling that a construct like JSON imposes a data structure that is hard to break away from.

  127. Zash

    Do something like header compression in h2?

  128. Ge0rG

    dwd: JSON shares most of the disadvantages of XML

  129. Zash

    CBOR!

  130. Ge0rG

    I liked the MIDI format, where all numbers are dynamic-width.

  131. dwd

    Ge0rG, Or BER, where they can be?

  132. jonas’

    matroshka?

  133. jonas’

    matroshka!

  134. Ge0rG

    dwd: I'd go with DER for lesser ambiguity

  135. Zash

    PER?

  136. dwd

    Ge0rG, CER?

  137. Ge0rG

    Also whoever made it possible to encode U-0000 as an arbitrarily long UTF-8 sequence deserves the highest punishment.

  138. jonas’

    tell me more

  139. jonas’

    can’t you encode all things as arbitrarily long utf-8 sequence though?

  140. dwd

    jonas’, Only by ignoring the standard.

  141. jonas’

    but that’s not true for U+0000?

  142. Zash

    JSON Encoding Rules

  143. Ge0rG

    jonas’: I'm only bitching because U+0000 has special meaning in C.

  144. Zash

    Is a thing

  145. Ge0rG

    jonas’: https://en.wikipedia.org/wiki/UTF-8#Description - UTF-8 just stuffs the data bits after the header. A sane encoding would be to automatically add 0x80 to the bits in a two-byte encoded charset, because you can represent the first 0x80 values in one byte, etc.

  146. Ge0rG

    jonas’: https://en.wikipedia.org/wiki/UTF-8#Description - UTF-8 just stuffs the data bits after the header. A sane encoding would be to automatically add 0x80 to the bits in a two-byte encoded codepoint, because you can represent the first 0x80 values in one byte, etc.

  147. jonas’

    yeah

  148. Ge0rG

    it would also reduce the required number of bytes.

  149. Link Mauve

    “Possibly even ignore namespaced attributes, since we never (?) use them.”, we do, @xml:lang for instance.

  150. Link Mauve

    dwd, ↑

  151. dwd

    Ah, true. But known ones like that we'd handle differently anyway.

  152. Link Mauve

    “12:00:46 Ge0rG> Also whoever made it possible to encode U-0000 as an arbitrarily long UTF-8 sequence deserves the highest punishment.”, you’re expected to reject it though.

  153. Link Mauve

    Same as any other overly-long sequence.

  154. dwd

    Oh. I found an actual bug in MUC.

  155. MattJ

    I'm all ears

  156. Ge0rG

    No way!

  157. jonas’

    Just one?

  158. dwd

    Well, sorta, anyway. When a client drops, it sends unavailable to the MUC automatically because Magic(tm) on the server.

  159. dwd

    But if the MUC switches nickname on join (210 code stuff), then the directed presence recorded on the server is wrong, and the user never leaves.

  160. jonas’

    yes

  161. MattJ

    Oh, that one

  162. jonas’

    that’s a known issue

  163. jonas’

    servers need to track nickname changes for that :)

  164. dwd

    I'd seen it with nickname changes, but it didn't occur to me (for some reason) it'd happen with nickname enforcing.

  165. Ge0rG

    Why can't we just implement MUC proxies on the server.

  166. Ge0rG

    That really would solve 99% of MUC's problems, in a backward compatible manner

  167. Ge0rG

    Zash even wrote a POC already.

  168. Ge0rG

    It's got some minor drawbacks, like you can't ever leave a MUC.

  169. fippo

    ge0rg: i think one of the dmuc proposals took that approach

  170. jonas’

    which is fun, by the way, because it means that the user’s server needs to support MUC for it to work properly :-)

  171. jonas’

    which reminds me of MIX

  172. jonas’

    except that with MUC, this requirement is hidden and not spelt out and you can join a MUC without that requirement fulfilled and have it work to a certain extent and then run in weird edge cases :)

  173. Ge0rG

    jonas’: you mean the weird edge cases we cope with every day now?

  174. Ge0rG

    Like never leaving a MUC if you changed your nickname?

  175. jonas’

    yes

  176. Ge0rG

    The awesome thing about MUC Proxy would be that it's 100% transparent to the clients and can be rolled out in an instant as an upgrade to fix most of the issues.

  177. Ge0rG

    Also could include offline notifications and other nice things.

  178. jonas’

    mh

  179. jonas’

    it would be somewhat like biboumi but for xmpp

  180. jonas’

    and looking at the quirks which still are there with persistency and biboumi, I’m not sure it’s as easy as you make it out to be

  181. Ge0rG

    jonas’: the quirks are there because the biboumi developers violently refuse to accept what's good design and practice.

  182. jonas’

    hm, where?

  183. Ge0rG

    jonas’: like where they send you individual messages to all of your resources with Carbons disabled?

  184. jonas’

    what would be a better way?

  185. flow

    I don't see a problem with that either, but I believe it should be the responsiblity of the receiving entity that they messages arrive on all devices (if it whishes so), not of the sending

  186. Ge0rG

    flow: the problem is that if you go offline, your messages get rerouted to a different resource, which ends up with two, three or four copies

  187. flow

    Ge0rG, ahh, ok I see the issue now.

  188. jonas’

    Ge0rG, but on the other hand, relying on carbons would mean that resources which are not interested in those messages (read: not joined in any IRC) get them.

  189. flow

    but wait,

  190. jonas’

    there’s no good solution here

  191. jonas’

    and we’ll have the same issues with MUC proxies.

  192. flow

    you have to go offline while biboumi is sending, otherwhise biboumi won't know of the resource

  193. flow

    Ge0rG, do you experience that a lot?

  194. Ge0rG

    flow: there used to be a long discussion on the biboumi tracker

  195. flow

    with many people reporting to hit that issue of duplicate messages?

  196. Ge0rG

    jonas’: that's the same problem as with MUCs you join from one client only and the PM Carbons.

  197. Ge0rG

    flow: yeah

  198. Ge0rG

    https://lab.louiz.org/louiz/biboumi/issues/3277

  199. jonas’

    Ge0rG, yes

  200. Ge0rG

    > Opened 1 year ago by Jonas Schäfer

  201. jonas’

    > Closed

  202. Ge0rG

    Also https://lab.louiz.org/louiz/biboumi/issues/3304

  203. jonas’

    also Closed

  204. Ge0rG

    jonas’: took some months to convince them.

  205. jonas’

    not für #3277

  206. Ge0rG

    jonas’: I can't find a way to search for comments by me, but I'm sure most of those would be bitching about how the developers don't understand XMPP.

  207. jonas’

    I wouldn’t accuse them of that.

  208. jonas’

    also, they’re still doing great work. I’m fine with the community ironing out the rough edges by filing issues.

  209. Ge0rG

    jonas’: oh, yes they are.

  210. Ge0rG

    biboumi is the best cross-protocol gateway I've ever seen.

  211. jonas’

    indeed.

  212. Ge0rG

    jonas’: the other thing being https://lab.louiz.org/louiz/biboumi/issues/3283

  213. jonas’

    Ge0rG, that might be fixed during the refactor mentioned in #3382

  214. Ge0rG

    jonas’: it's not about things being fixed, it's about how hard it is to convince the developers that they _need_ to be fixed.

  215. jonas’

    edge-cases all abound

  216. jonas’

    lots of edge-cases not only means lots of code to write, it also means lots of hard-to-reproduce stuff which will be tricky to nail down and prove.

  217. jonas’

    and we’ll have exactly the same issues with a MUC proxy

  218. Ge0rG

    I'm a certified MUC Corner Case Debugging Engineer.

  219. Zash

    If that's the case, where's your diploma?

  220. Ge0rG

    https://op-co.de/tmp/MUC-CCDE.jpg

  221. Seve

    Good job Ge0rG! You deserve it!

  222. jonas’

    well done

  223. Seve claps

  224. jonas’

    put it on your council application

  225. jonas’ wonders about the significance of that date

  226. Ge0rG

    jonas’: @horazont horazont merged commit b017284 into xsf:master on Mar 8

  227. jonas’

    ah, #stable_id

  228. edhelas

    Ge0rG don't fix too much MUC, we'll not have reasons to work on MIX anymore

  229. Ge0rG

    jonas’: good idea!

  230. Ge0rG

    edhelas: now you uncovered my evil secret plan!

  231. edhelas

    Make MUC Great Again

  232. Zash

    MUC was never great

  233. pep.

    Who can modify the xsf calendar? To add 35C3

  234. pep.

    I still have one last voucher btw, if people are interested. Grab it now or it will expire

  235. edhelas

    In 0060 the <configure/> tag is defined this way <xs:element name='configure'> <xs:complexType> <xs:choice minOccurs='0' xmlns:xdata='jabber:x:data'> <xs:element ref='xdata:x'/> </xs:choice> </xs:complexType> </xs:element>

  236. edhelas

    However I see some <configure node='princely_musings'> in the examples

  237. edhelas

    Shoundn't we add <xs:attribute name='node' type='xs:string' use='required'/> ?

  238. Ge0rG

    Our wiki also has a horrible mobile expediency. Pinging I-team

  239. ralphm

    edhelas: well, not required. If using collections, you also want to be able to configure the root node, which is basically leaving off the node attribute.

  240. ralphm

    Also, you're looking at the wrong namespace. Try pubsub#owner

  241. ralphm

    The one in the regular pubsub node goes together with <create/> where you already have the node reference.

  242. ralphm

    eh, pubsub namespace

  243. edhelas

    ralphm thanks for the precision

  244. edhelas

    my bad

  245. ralphm

    So example 137 vs 140

  246. ralphm

    no worries

  247. ralphm

    I still regret we used multiple namespaces

  248. Zash

    The verb another level in is weird too