XSF Discussion - 2019-03-24

Guus 08:45:45
MattJ / flow : https://github.com/xsf/xeps/pull/771
Guus 08:47:18
Would something go horribly, horribly wrong, if a server simply adds a stable/unique stanza ID to any message that it processes?
MattJ 08:48:18
It makes client life hard/impossible, sadly
MattJ 08:48:52
Clients would no longer know when a message is archived or not
MattJ 08:49:34
So they don't know if the id can/should be stored for later querying (e.g. for catch-up)
Guus 08:51:05
I do not like the fact that we're deducing that something is archived by merely detecting the presence of something that's supposed to be an opaque identifier.
Guus 08:52:15
(also, I don't have a better suggestion)
MattJ 08:53:03
Guus: that's why <archived> existed
Guus 08:57:08
What's the impossibility for clients, exactly?
Holger 09:04:24
Hmm I don't quite see this problem. There's no guarantee archived messages will remain in the archive forever anyway.
Holger 09:05:36
And I don't quite see how the info whether a locally stored message is also in the archive helps the client.
Holger 09:06:36
In my book it's fine to add a stanza ID to all messages. It may actually help with non-MAM use cases.
Guus 09:12:41
It'd make my implementation a lot easier...
Andrew Nenakhov 09:15:37
> Would something go horribly, horribly wrong, if a server simply adds a stable/unique stanza ID to any message that it processes? In short that's the basis of our XEP that we use to ensure message delivery. Works well.
Guus 09:16:23
Andrew Nenakhov the basis is that something goes wrong if we do (and you found an alternative), or: you do it, and you've seen that nothing goes wrong?
Andrew Nenakhov 09:16:42
Client sends stanza with provisional id, server stamps it with 0359 unique and stable id, sends this id to client as a confirmation.
Andrew Nenakhov 09:16:58
Guus, define wrong )
Guus 09:17:11
Mushroom-clouds on the horizon.
Andrew Nenakhov 09:17:35
We centralize everything to work via server archive. If archive breaks, kaput, yes
Andrew Nenakhov 09:18:11
> Mushroom-clouds on the horizon. In case of an all scale nuclear weapon attack, im protocols will hardly matter anymore
Guus 09:24:17
Andrew Nenakhov I do like to make sure though that IM protocol design decisions do not cause an all scale nuclear weapon attack.
MattJ 09:34:11
Holger, you're right (there is no guarantee archived messages will remain in the archive forever)
MattJ 09:34:30
But if it's not in the archive, the client can assume it was purged and it needs to re-fetch the archive
Guus 10:11:58
MattJ: how can it fetch a purged archive?
MattJ 10:12:55
Guus, I mean in the sense that old messages are purged
Guus 10:14:12
Mind you that it is Sunday, I'm an idiot, and did not have enough coffee
Guus 10:14:16
I don't understand.
MattJ 10:14:33
Messages in the archive are not kept forever on most deployments
MattJ 10:14:58
The oldest messages are removed after some expiry time (let's say 30 days)
Guus 10:16:03
with you so far.
Guus 10:16:15
but why does the client need to re-fetch the archive if that happens?
Guus 10:16:28
need/would want to
MattJ 10:16:55
So a client that wants to receive all messages works by continuously remembering the id of the last archived message it received
MattJ 10:17:16
When it goes offline for a couple of days, it will come back online and request all messages since the last id it saw
Guus 10:17:17
if anything, it'd do a massive amount of data transfer only to end up with _less_ local history?
MattJ 10:17:25
With me still?
Guus 10:17:40
yes
MattJ 10:17:51
So now it goes offline for two months
MattJ 10:18:00
The last id it saw is no longer in the server's archive
MattJ 10:18:10
So it performs the query and gets item-not-found
MattJ 10:18:43
so it knows that the message has been expired, and any messages in the archive are messages it has never seen before
MattJ 10:18:50
because they are all newer
Guus 10:20:38
right. And if we'd slap on a stanza-id on every message without archiving, it'd _always_ get a item-not-found, assume its local cache is older than that's on the server, and it'd download all history, every time.
Guus 10:20:44
that's what you're saying, right?
MattJ 10:21:17
Yes
Guus 10:21:44
I'm guessing that it'd not do this if the server doesn't advertise the MAM feature though.
MattJ 10:22:14
Sure
waqas 10:23:13
I feel like the whole "item-not-found means get full archive" thing is a hack. A server could lose a message for other reasons, e.g., storage failure causing recent stuff to be lost, or deletion of specific message due to gdpr, or some bug, etc.
MattJ 10:23:40
waqas, it's not allowed to
pep. 10:23:41
> and it'd download all history, every time. It would download up to the date it just requested
Guus 10:23:42
waqas I was trying to formulate a similar remark in my head.
MattJ 10:24:02
waqas, it can replace with placeholders if it needs to
pep. 10:24:14
Which may or may not be the whole history
waqas 10:24:20
MattJ: Storage failure isn't something someone can't be allowed to have.
MattJ 10:24:43
waqas, handling storage failure in defined ways is entirely sensible
Guus 10:24:43
sure, but soleely depening on 'item-not-found' based on a last-known ID still seems ... hackish...
Guus 10:24:52
sure, but solelely depening on 'item-not-found' based on a last-known ID still seems ... hackish...
MattJ 10:25:01
Guus, it's defined by the XEP to be this way, it's absolutely not a hack
MattJ 10:25:07
I mean, what else would you guys propose??
waqas 10:25:24
MattJ: Not really. If you lose a disk and restore from recent'ish backup, you'll have a situation where supposedly every recent message would be item-not-found..
Guus 10:25:58
MattJ it's a lot easier to disagree with stuff without having to suggest better alternatives 😉
MattJ 10:26:12
waqas, you can't just rewind time like that in most systems without consequences
waqas 10:27:02
Yes, and given that laws of physics disallow "messages can't be removed from archive after acked", a protocol shouldn't rely on that.
Guus 10:28:32
what if the client asks for the last-known ID archived by the server?
waqas 10:29:42
MattJ: To be clear, I think a sane recommendation would be if item-not-found, get archive by some timestamp based setup, but trying to get archive from beginning of time is silly in such a case.
pep. 10:30:13
(What I said above?)
Guus 10:30:26
(removed bad idea)
waqas 10:30:26
Yep, listen to pep.
MattJ 10:30:45
Yes, but the server was relocated to a different timezone and the admin forgot to set it to UTC
pep. 10:31:19
Dates don't include TZs? :s
waqas 10:31:35
Almost all popular dbs people use (mysql, postgres) in their default replica settings, when the master node is lost and another takes over (or a restoration from backup happens) will potentially lose recent writes. If the MAM XEP wants to assume that wouldn't happen, I'd consider it pretty silly.
MattJ 10:32:09
waqas, if you want to write your own XEP go ahead
waqas 10:32:53
MattJ: Do you see the problem I'm pointing out?
Guus 10:32:59
Maybe 'silly' isn't the best classifier here.
Guus 10:33:29
> Yes, but the server was relocated to a different timezone and the admin forgot to set it to UTC do we need the XEP to account for this?
MattJ 10:33:44
Guus, do we need the XEP to account for any of this?
Guus 10:34:54
Well, if we can modify it somehow to be more resilient against data corruption, and allow for easier re-use of stanza-id, I think it'd add considerable value.
MattJ 10:35:06
waqas, I don't think a server that can't provide a durable store should be able to claim it does
MattJ 10:35:33
There's a simple fix for this, the XEP already has a flag to tell the client that the results are not necessarily persisted
waqas 10:36:35
MattJ: I'm asserting that the vast majority of MAM deployments can't guarantee durability in a disk-lost scenario. Recent writes being lost is a fact of life, you can't spec your way around it without mandating things you have no way to mandate.
MattJ 10:37:13
I look forward to your PR
waqas 10:37:16
Note that I don't think the MAM XEP has to change, just the assumption that item-not-found always means MAM storage was deleted up to that item is wrong.
MattJ 10:38:12
So yet another hidden thing for client devs to think about
waqas 10:41:05
In a world where fsync doesn't necessarily mean data was durably stored, and SQL dbs multi-master replication defaults to async mode (and is rarely used anyway), that's reality.
Guus 10:49:53
MattJ where in the XEP is the what I called 'hack' described?
Guus 10:50:18
I was looking to see if the exact wording would make me think of hints for improval
MattJ 10:50:23
Guus, it quite possibly isn't
Guus 10:50:38
ah ok.
Guus 10:53:11
I'd love to be able to add stanza-id's everywhere, without implying that this means that MAM is available.
Guus 10:53:52
but doesn't service discovery sufficiently guard against that?
MattJ 10:54:10
Adding stanza-id doesn't imply MAM is available
MattJ 10:54:30
Buf it MAM is available, it implies you can't put stanza-id on every stanza
Guus 11:11:50
I'd like be able to. Is a feasible solution one that allows the client to request the id of the most-recent MAM entry, in order to verify if it has that one in its local archive?
Guus 11:14:25
If the XEP doesn't currently define the 'store the id of the last message, assuming that it is the last ID in your server-sided archive', there might be room for a change like that?
MattJ 11:16:34
Guus, one of the main premises of the XEP is history sync, this would break it
MattJ 11:17:15
Forget the message purging issue for the moment
MattJ 11:17:59
If the client records the id of the last message it received, and then later uses this to query an archive, what would you propose it do if the id it happened to remember wasn't an archived one?
Guus 11:18:34
item-not-found
MattJ 11:18:40
and then what?
Guus 11:20:33
Naively (I'm not client builder): I'd see up until what date I'd have a local archive, and retrieve from there.
MattJ 11:20:43
So fetch by timestamp?
Guus 11:20:56
with some wiggle-room, but yes.
MattJ 11:20:59
That way you'll either get duplicates or miss messages
MattJ 11:21:04
And that's not hackish?
Guus 11:21:17
Duplicates I can de-dupe with the message ID
MattJ 11:21:19
We could have just built the whole XEP on timestamps instead of ids if we're happy with that
Guus 11:21:20
misses would be bad.
MattJ 11:21:51
It's an ugly hack
Guus 11:21:56
well, let's not rewrite everything just yet - I'm fairly certain you've put way more thoughts into this than I have 🙂
MattJ 11:22:25
This is not something I would accept a rewrite for, for certain
MattJ 11:22:44
The correct fix is to re-introduce a way for the client to know whether the message is in the server's archive or not
Holger 11:22:58
> Buf it MAM is available, it implies you can't put stanza-id on every stanza Depends on server implementation, no? The server just must be able to respond to the before/after requests.
Guus 11:23:05
so, why can't it ask for the last-recorded message id in the archive?
MattJ 11:23:19
Guus, how does that help?
Guus 11:23:23
what's my last message? do I have this? no: resync everything.
MattJ 11:23:46
Guus, that's broken
MattJ 11:24:04
Just because you don't have the last message in the archive doesn't mean you don't have the first
Holger 11:24:12
E.g. ejabberd uses timestamps as IDs, so it doesn't matter whether the queried ID is archived, before/after still does the right thing.
Guus 11:24:21
resync everything from the last one that you have, I mean.
MattJ 11:24:35
Guus, you don't know what the last one you have is
MattJ 11:24:52
Holger, multiple stanzas with the same timestamp?
Holger 11:25:17
Microsecond accuracy, if you hit that in practice then yes it breaks.
MattJ 11:25:28
Holger, what about clock drift then?
Guus 11:25:41
MattJ how don't you know what your last message is? You can order your local archive chronologically, use the last one?
MattJ 11:25:54
I'm not against using timestamps *in* the id, but it's wrong to use them as the id with no extra logic
MattJ 11:26:19
Guus, the last what? I don't know which ones the server archived
Holger 11:26:23
MattJ: Clock drift across cluster nodes? That would break as well yes.
Guus 11:26:25
Hmm, my parents just walked in. Wife is preparing for 'the stare' again.
Holger 11:27:45
(Or do you mean clock jumping backwards? That can't happen with our clock except server restarts.)
Guus 11:28:00
Mattj, but if archiving is enabled, you can assume that the messages that you have in ... aah, I don't have the time to further discuss this now, sorry.
Guus 11:28:03
('stare')
Guus 11:28:08
I'd love to pick this up later.
MattJ 11:28:09
Holger, using the system's monotonic clock? or something custom?
Guus 11:28:10
got to go now
Holger 11:28:46
Erlang has a thing that doesn't jump back, not sure how it's implemented.
Holger 11:30:41
Anyway yes this is not the most robust solution against such pathological cases of course (it just has other nice properties). Whatever I just wanted to say that MAM doesn't imply only archives messages have an ID per se.
Holger 11:31:19
(ejabberd doesn't actually add IDs to non-archived messages, though I keep pondering with it.)
MattJ 11:33:16
Holger, as discussed, things will break (read: get hard/impossible) for clients if you add stanza-id to non-archived stanzas
MattJ 11:33:29
which is not a good situation, and should be fixed
Holger 11:35:44
Maybe I misunderstood the breakage vector. I would've thought things will be fine as long as the server is aware how the non-archived IDs are ordered compared to the archived messages.
Holger 11:44:42
MattJ, just in case you're interested, this sounds like custom clock that (attempts to) adjusts towards OS clock by changing frequency (up to 1%) while avoiding jumps: http://erlang.org/doc/apps/erts/time_correction.html#No_Time_Warp_Mode
MattJ 11:45:13
Fun
Holger 11:45:31
(At the cost of risking incorrect offsets of course, so they warn against doing this.)
MattJ 11:45:55
Holger, the server knowing how to interpret the ids is not really relevant... unless you're saying it should not return item-not-found but quietly accept ids that don't actually exist in the archive
MattJ 11:46:25
That would cause weirdness with clients that try to fill holes
MattJ 11:46:31
and probably other stuff
Holger 11:47:08
> unless you're saying it should not return item-not-found but quietly accept ids that don't actually exist in the archive Ah yes that's what I'm saying. IIRC 0059 suggests doing just that (wasn't it even a SHOULD?).
Holger 11:48:16
But I'm on my phone and the sun is shining. Gonna shut up now 🙂
MattJ 11:48:52
Is it too late to start over with MAM?
MattJ 11:49:02
Not using RSM for a start
MattJ 11:49:16
Trying to use existing building blocks has just caused confusion and unintended consequences
pep. 11:49:45
Well MAM is still experimental :-°
pep. 11:49:51
What about another bump?
MattJ 11:50:11
Everyone would love that
pep. 11:51:23
That's a thing I don't like in general. The XEP is still experimental but in reality it's just as if it was almost Final. If you change anything everybody is going to grump
MattJ 11:52:23
It certainly still has open issues, as a spec
pep. 11:52:43
Sure. I'm not just talking about MAM, that's how I feel about our specs in general
MattJ 11:53:25
Can't have it both ways
MattJ 11:53:50
Just this morning it was mentioned that XEP-0313 being Experimental is a reason Pidgin doesn't have support
pep. 11:54:15
I'd say that's an issue with developer expectations. If you implement it as experimental, know that it's likely going to change
pep. 11:55:21
(And even more, really, draft, even a final spec can be amended with another spec, so..)
pep. 11:56:34
MattJ, isn't that just an excuse from pidgin devs? :p
MattJ 11:56:48
It's not a viewpoint I share, but I'm biased
waqas 11:57:00
Are devs expected to implement experimental xeps?
MattJ 11:57:21
If a standard explicitly has a big red warning at the top, and warning or no warning is subject to radical change... if I had a limited amount of free time, would I want to implement it?
waqas 11:57:55
"While implementation of an Experimental protocol is encouraged in order to determine the feasibility of the proposed solution, it is not recommended for such implementations to be included in the primary release for a software product (as opposed to an experimental branch)." — https://xmpp.org/extensions/xep-0001.html#states-Experimental
pep. 11:58:19
waqas, in the meantime, it's a needed feature
pep. 11:58:37
And it's even in the compliance suite..
MattJ 11:58:59
That's the real problem (that experimental or not, it's a needed feature)
pep. 12:00:49
I'd say both these criteria (needed feature / compliance suite) put even more pressure on the XEP to go to draft/final. I'm not saying I like it
pep. 12:01:06
And as you say there are still areas that need to be improved
pep. 12:03:22
Maybe there should be a rule that compliance suites can't recommend draft specs. In the hope that people focus/provide feedback on XEPs that are needed
Zash 12:03:46
I thought there was
pep. 12:04:03
Well if there was, MAM shouldn't be in there
pep. 12:04:31
nor carbons? (last call ended but it's still proposed)
waqas 12:04:53
We should stop calling them "compliance" suites
flow 17:00:41
MattJ, I am not sure if using existing building blocks caused confusion. It appears to me that not clarifying how they are intended to use and are allowed to use (think for example if <before/> and <after/> can be used in the same query) is causing confusion
MattJ 17:10:36
flow: they can't, the end :)
flow 17:12:19
That is what I would also say, but it is at least underspecified in XEP-RSM