XSF Discussion - 2019-06-19


  1. Daniel

    Zash, so wait. if i reboot prosody everyone stays in the room? as long as they don’t send errors which i guess under normal circumstances they shouldn't

  2. Kev

    That's what's happening in M-Link too in MUC rework we've got underway.

  3. Daniel

    right. so i guess this would cover a lot of the cases except for when during downtime I try to (re)connect to a muc, get 'server not found' and then don’t know when it is fine to try again

  4. Daniel

    but it it would at least solve the ghost room problem

  5. jonas’

    I’m not sure how well it works in practice

  6. jonas’

    but I guess that’s the point of it (that you don’t notice a graceful restart) anymore

  7. jonas’

    but I guess that’s the point of it (that you don’t notice a graceful restart anymore)

  8. Daniel

    i think it might even break in the case where i have a reconnect (with the same resource) during down time. i will get a server not found back from my own server. thus my client thinks i'm not in the room. but after restart the room thinks i'm in it

  9. Daniel

    and then messages sent to me won’t get bounced (by my server) because my resource is the same

  10. jonas’

    but then you’re still confused on the client side (which you can fix up)

  11. jonas’

    with a rejoin probably, because you threw state away

  12. Daniel

    yes. so clients need extra logic that a message (or presence) received to a room that seems to be offline should trigger a ping/join

  13. Daniel

    or something

  14. Daniel

    which is ok i guess

  15. Daniel

    but needs to be handled (and probably documented)

  16. Kev

    Restarts are usually so quick that you don't notice them.

  17. Kev

    (Unless you're silently dropped from the room)

  18. jonas’

    Kev, depends on what is restarted ;)

  19. jonas’

    if it’s your own server, you’ll definitely notice. if the server is being fully rebooted, it can take minutes

  20. Holger

    Don't we already have enough "works most of the time" cases?

  21. Daniel

    in what cases would this break?

  22. Link Mauve

    Daniel, I’m not sure I understand your last example, in most cases if your client reuses a previous resource, it’ll have the MUC in bookmarks and join it again afterwards, right?

  23. Link Mauve

    In doing so, it sends a MUC join and the service then knows it has to send the full room state again, and can consider the previous full JID as out of the room.

  24. Link Mauve

    And when your new client doesn’t know it should join the MUC, it can send back an error to any groupchat message it receives from that room.

  25. Link Mauve

    Or am I missing something?

  26. jonas’

    Link Mauve, daniels scenario was that the muc is currently rebooting while the client reconnects

  27. jonas’

    thus the rejoin gets bounced with remote-server-not-found

  28. jonas’

    and never reaches the MUC service

  29. Link Mauve

    Ah right.

  30. Link Mauve

    Yeah, then you have to try again with exponential back off, like in any current case of remote-server-not-found.

  31. jonas’

    Link Mauve, but then the MUC service comes back and starts sending you type="groupchat" and presence

  32. MattJ

    Probably the MUC service should ping persisted occupants after a restart

  33. MattJ

    and by ping, probably I mean probe

  34. Zash

    In Prosody, that's already what sorta happens since rooms are usually restored from storage by some event that results in a broadcast.

  35. Zash

    Daniel Yes, everyone stays in the room. Rooms can be saved to disk and removed from memory and then brought back at any time for a few different reasons, of which graceful shutdown is only one.

  36. Zash

    When all goes well, nobody notices.

  37. Zash

    The situation where the room thinks you're stil there but the client doesn't think so only happens because the move to long-term stable resources. If you get a new resource every time you connect, this takes care of itself eventually via kick-causing error bounces.

  38. Daniel

    the move to long term stable resources only happened because otherwise we have no ability to kick the old one

  39. Zash

    I don't think that's true

  40. Daniel

    that this was the reason or that we have no way of kicking the old?

  41. Zash

    Removal of stale sessions can be done, dwd has written stuff about this before.

  42. Zash

    And they should get removed eventually anyways

  43. Zash

    Like, the server could ping existing sessions when a new session connects.

  44. Daniel

    fwiw if that's what it takes i'm fine with moving to random resources

  45. Daniel

    users will hate it

  46. Zash

    As always, there are tradeoffs

  47. Holger

    Ping existing sessions sounds ugly to me.

  48. Holger

    A problem in practice I see is the delay. Stanzas queued for the old session won't be resent before the ping times out.

  49. Daniel

    Zash, is this prosody 0.11 or current development?

  50. Zash

    Everyone being kicked from rooms all the time because "Disconnected: Replaced by new connection" is also ugly

  51. Zash

    Daniel Theoretical

  52. Daniel

    Zash, the storing muc state i mean

  53. Daniel

    not the pinging of resources

  54. Link Mauve

    Daniel, 0.11 this one.

  55. Zash

    Yes, 0.11

  56. Zash

    Rooms can be saved to disk on graceful shutdown, module unload (and reload) or when they are evicted from a LRU cache.

  57. Daniel

    can or will?

  58. Daniel

    does this need to be configured?

  59. Zash

    Will. Enabled by default. I'd have to check docs or code to remember details of what can be configured.

  60. Holger

    You're not worried about the init system killing the graceful shutdown due to timeout on servers with large/many rooms?

  61. Daniel

    do you have any grasp on how well that works in practice? because i still have countless users telling me about ghost mucs. but of course it might be that they are all on ejabberd

  62. Daniel

    countless ~= 3

  63. Zash

    Holger: Dunno, should we be?

  64. Daniel

    but they are very annoying about it :-)

  65. Zash

    Loud minority?

  66. Holger

    Zash: That's the main reason that made me hesitate to implement thing.

  67. Holger

    *the same thing.

  68. Zash

    I suspect that ejabberds closing of idle s2s connections isn't helpful here.

  69. Daniel

    why? saving state doesn’t require sending something over s2s does it?

  70. Zash

    I mean about ghost rooms/users. s2s connection gets closed and then fails to be reestablished for something, and then ghosts.

  71. Link Mauve

    Yeah, I’ve often been kicked out from (old) Ejabberd rooms without being notified, this doesn’t happen much lately.

  72. Holger

    If reconnecting fails I'd assume the old connection would've been lost as well.

  73. Zash

    There's more likely that an unavailable presence can be delivered over an established s2s connection than if it has to reestablish it again.

  74. Zash

    Prosody in some configurations doesn't even manage to send anything when shutting down, making this worse.

  75. Holger

    Either way, personally I'd still prefer MUC Push over all these solutions that try to work around all these problems with MUC relying on presence.

  76. Holger

    The only real corner case I see with this is the first participant who'd like to write a groupchat message after MUC service restart.

  77. Daniel

    Holger, the question is if muc push really becomes the go to thing and all clients enable 1-2 push targets on every join wouldn’t the load on the db be the same as persisting presence?

  78. Holger

    Fixing that might require some client-side hack, or waiting for MIX.

  79. Zash

    I did start on an experimental hack that would make MUC joining account based

  80. Holger

    Daniel: Yes it's just a more robust solution, in my book. As you can't do the presence thing for clients without persistent connection anyway.

  81. Holger

    (Except with the super-ugly hack of faking their presence state while they're disconnected.)

  82. Zash

    Daniel, ask MattJ about mod_devices 🙂