XSF Discussion - 2021-05-08


  1. L29Ah

    is it me, or it's counter-productive to use XEP-0198: Stream Management on flaky connections? without XEP-0198: you are offline, so the messages for you are conveniently stored indefinitely and relayed to your client when it goes online, and your frens know that you're offline so they shouldn't expect an immediate reply with XEP-0198: you are offline, but for XMPP purposes you are "online" for 10 minutes or so after you disconnected, the messages sent to you in the meantime are moved into /dev/null in case you don't reconnect in time, and your frens are confused

  2. Daniel

    The messages don't end up in /dev/null

  3. L29Ah

    Daniel: where they do end up then? they certainly don't get relayed to the recipient when it goes online like regular "offline" messages

  4. Daniel

    Depending on configuration they get returned to the sender with an error message or they end up in the offline queue

  5. jonas’

    L29Ah, your assumption for the non-198 case is not correct

  6. jonas’

    the common case on a flakey connection is that the connection looks alive to the server

  7. jonas’

    for several minutes, maybe longer

  8. jonas’

    any message during that interval will *actually* end up in /dev/null (barring MAM, Carbons, but those would also apply in your with-198 case), because the server has no way of figuring out whether or not they were delivered to your resource (no acking mechanism)

  9. jonas’

    i.e. thanks to the Two Generals problem, the server cannot know for sure when your connection got interrupted exactly. 198 does not solve two generals (obviously), but the approximation of the real state is better due to the explicit acking of received stanzas

  10. jonas’

    so with-198 is better in that regard because: - *if* a flakey connection allows you to reconnect within the 10 minute timeframe, you can resume without any lost state - the server gets a better approximation of what got delivered and what not and can make better decisions based on that regarding rerouting/offline store etc. (not that either of the two is really relevant with MAM/Carbons) but also IQ error replies (which are a nicety for other entities)

  11. jonas’

    the only downside is the potentially fake online state while the connection is resumable; given that most clients do not show the online state prominently anymore anyway, I wouldn’t say it’s much of a bother

  12. jonas’

    (sidenote: some implementations will prolong the '198 "hibernation" lifetime if the client has registered for push notifications to the time of the next push notification + some interval)

  13. menel

    It's only a problem if the server hides the error and has no mam and offline storage. Hopefully nobody configures it like that.

  14. L29Ah

    > because the server has no way of figuring out whether or not they were delivered to your resource (no acking mechanism) no, the server can ask the OS how many bytes had the client ACKed

  15. L29Ah

    https://www.ejabberd.im/faq/tcp/ indeed i can ask ejabberd to save the messages instead of losing them; the lose-by-default behavior looks insane to me

  16. L29Ah

    thanks

  17. menel

    L29Ah: don't you use mam? That would solve the problem for you

  18. L29Ah

    no

  19. L29Ah

    and i'd prefer it to be solved for everyone, not just me, tbh

  20. L29Ah

    MAM is cumbersome to implement so we won't see it everywhere ever

  21. Holger

    Oh so you cross posted. I responded in the ejabberd room.

  22. Holger

    But yes the proper solution is MAM. You can't really implement reliable delivery without (as I told you the other day already).

  23. Holger

    > https://www.ejabberd.im/faq/tcp/ indeed i can ask ejabberd to save the messages instead of losing them; the lose-by-default behavior looks insane to me I explained the reasoning for the default behavior in that very article. (Which was written back in the days without MAM in mind. MAM solves that crap.)

  24. L29Ah

    Holger: the reasoning asserts that losing a message is better than sending it twice

  25. Holger

    Returning a proper error message is different from silent loss.

  26. L29Ah

    it is effectively a loss in case the sender is no longer around

  27. Holger

    But yes bouncing an error is better than potentially large bursts of duplicates.

  28. Holger

    The latter is terrible UX.

  29. Holger

    Whatever. MAM is enabled by default.

  30. Holger

    I'm not so motivated the relative terribleness of different non-working workarounds.

  31. Holger

    I'm not so motivated to discuss the relative terribleness of different non-working workarounds.

  32. Holger

    If you, as an ejabberd admin, prefer a different behavior, I gave you the config settings to do that.

  33. Holger

    If there was a behavior that was strictly better (no downside) it wouldn't need those config knobs.

  34. flow

    > no, the server can ask the OS how many bytes had the client ACKed a TCP level ack does not automatically imply that the data was processed on the application level

  35. flow

    L29Ah, ↑

  36. Holger

    Right. And it's not available on all platforms. And terrible to implement. That's definitely not the proper solution to this issue.

  37. L29Ah

    flow: sure, but in real world scenarios it's virtually always the case

  38. L29Ah

    and using TCP ACKs is strictly better than silently discarding messages

  39. Daniel

    Just be aware of the implications and do what ever works for you

  40. flow

    L29Ah, maybe, but it's fragile and the assumption is just not correct

  41. Daniel

    I have a deployment where we rely on offline messages only and have fairly low sm timeouts

  42. Daniel

    Like 60s or so

  43. flow

    so I am not even sure if TCP acks are better, but i think we at least agree that application level acks is what you want

  44. Holger

    The relative trade offs of non-MAM solutions strongly depend on whether you support multi device.

  45. flow

    that ↑

  46. Zash

    Did anyone feel inclined to write "XMPP Service Discovery Best Practices" ?

  47. MattJ

    Did anyone feel inclined to write a list of XEPs that need writing?

  48. Zash

    Or a list of people who could write lists of XEPs that need writing?

  49. edhelas

    Zash, you're responsible of that list then, problem solved