XSF Discussion - 2018-02-28

edhelas 06:57:04
We do indeed
goffi 07:07:36
Hi, SàT support it too
SaltyBones 07:15:52
SaT?
goffi 07:17:48
SaltyBones: https://salut-a-toi.org
vanitasvitae 07:19:34
Just read the XMPP Newsletter. Good work :)
daniel 07:19:49
Oh there is one already?
daniel 07:20:11
Is it available on a website? Or is it literally just a newsletter?
vanitasvitae 07:20:27
https://xmpp.org/2018/02/the-xmpp-newsletter-28-february-2018
goffi 07:21:18
neat :)
vanitasvitae 07:21:20
Basically it is a link list, but there were one or two articles that slipped my eyes
Ge0rG 09:09:42
I love it how xmpp.is say they won't support the spam fighting manifesto, and then describe how they essentially implement each of the stated requirements <https://xmpp.is/2018/02/21/the-jabber-spam-fighting-manifesto/>
Maranda 09:20:17
Ge0rG, infact xmpp.is is one of the servers I get spim hits from atm, *the hilarity ™️®️*
Maranda 09:20:26
I wonder why
Ge0rG 09:21:43
Maranda: Contact addresses for xmpp.is are https://xmpp.is/contact/ (support, admin, feedback, abuse)
Ge0rG 09:21:48
please feel free to bother them.
Maranda 09:22:26
🤣
Ge0rG 09:25:09
When looking for xmpp.is in my log I found this one instead: https://isopres.de/impressum/
mathieui 09:28:35
Thanks for the newsletter, by the way
Ge0rG 09:28:48
Yeah, it's awesome!
Ge0rG 09:29:14
I'd love to have the big ones (like EVE and Epic) mentioned on our twitter as well
jonasw 09:39:35
this ID mess is a mess
jonasw 09:40:26
Kev, there’s no reasonable way we can actually force clients to generate globally unique IDs though, is there?
jonasw 09:40:39
so I don’t think that there’s an actual solution to the "appears to be rewriting history" issue.
Kev 09:42:23
No, I think there's not (I keep saying that, I think) - I wasn't arguing we can solve it, I was arguing that "they've got lots of power anyway" isn't the reason to not care.
jonasw 09:43:21
oh, I must’ve misunderstood that
jonasw 09:43:29
how would we be caring then?
Kev 09:44:12
I felt that "they've got lots of power anyway" was a "we shouldn't care". We should care, we just probably can't avoid it, so we carefully document it in security considerations etc.
Kev 09:44:57
It sounded like Simon was saying that malicious clients/servers are a problem not worth thinking about. I might have misinterpreted his words.
SaltyBones 10:02:15
Yeah, that's not what I was trying to say...
SaltyBones 10:10:57
Hm...how to sum this up...1. A client and server can claim that a message-ID was A when it was B; that is almost unsolvable but the other recipients just won't believe it anyway. 2. There is usually no authentication in XMPP so the point seems a bit moot.
jonasw 10:11:33
the main issue is that an ID can be re-used
jonasw 10:11:38
as far as I understand it
jonasw 10:11:55
and since we use IDs as identifiers in various protocols (LMC, but also References and stuff), that’s a problem
SaltyBones 10:12:34
Yeah, but there is a difference between assuming that it happens by accident and can be fixed and assuming that it is adversarial.
jonasw 10:13:19
I’m confused
Ge0rG 10:14:32
The only way to properly solve this is to limit the validity domain of IDs to a single session.
Kev 10:14:44
SaltyBones: "no authentication"?
Kev 10:14:50
I think one can reasonably argue with that statement.
jonasw 10:15:27
Kev, I think they refer to "cryptographic authentication". A server can esaily forge a stanza for anyone to or from his domain.
SaltyBones 10:15:38
Yeah, sorry.
SaltyBones 10:16:12
Ge0rG, but then you could force the server to have some sort of UUID and a session counter and if you throw the three things together you get reasonable global IDs.
Kev 10:16:18
There's cryptographic authentication, even. It's just that it's applied to the domain, not beyond.
goffi 10:17:02
is there a licence for the Newsletter? Would be nice so it could be translated
SaltyBones 10:18:25
goffi, has been discussed in the comm team channel
Tobias 10:18:29
isn't all website content on xmpp.org under a single license
Kev 10:20:23
Should be, but I think that notice was lost at some point.
SaltyBones 10:20:25
Kev, yes, but there is no message authentication so if you only store messages you cannot prove to anybody later that serverA sent you messageB with ID-C...
Kev 10:20:42
That's somewhat different.
SaltyBones 10:21:18
~~Yes, just pointing it out because you of your "somebody claims you liked a post about KKK" example~~ ✎
SaltyBones 10:21:24
Yes, just pointing it out because of your "somebody claims you liked a post about KKK" example ✏
jonasw 10:22:01
SaltyBones, the issue is that that scenario can be caused by a client alone.
SaltyBones 10:22:10
jonasw, the reason I ended up with this proposal is that it makes the situation better and it is very easy to implement.
jonasw 10:22:20
a mailicous server is really powerful, indeed, and we generally assume that each user can trust their own server, and that they have to some extent trust a MUC service if they’re using one
Kev 10:22:40
SaltyBones: I think my issue is that the core of your proposal (the hashing thing) doesn't make anything better :)
Kev 10:23:00
Or, I don't see how it does.
SaltyBones 10:23:36
Kev, that's a reasonable way to look at it. You could just as well not do the hashing and just use per-connectionserver-salt + connection-counter
SaltyBones 10:24:10
That was just a fix for the "oh but the connection counter leaks stuff" problem...
Kev 10:24:16
Oh, no, you couldn't do counters, because of the leaks.
Kev 10:24:41
But as telling the server what ID to use for MAM isn't sensible, AFAICS, I don't think the server being able to predict the client ID buys much at all.
SaltyBones 10:25:09
If the server can assure that the client ID is unique it can simply use that for MAM.
SaltyBones 10:25:18
That's the main point... :)
jonasw 10:25:32
SaltyBones, but that only solves the "client knows the eventual ID of the message"
jonasw 10:25:46
it doesn’t solve any of the malleability things cross-domain.
jonasw 10:26:00
that is only solved by your rewriting proposal, and I’m not confident that will work properly
jonasw 10:26:10
except with a lot of complexity on the server
SaltyBones 10:26:52
~~jonasw, not sure what you mean~~ ✎
SaltyBones 10:27:00
jonasw, not sure what you mean with malleability cross-domain ✏
Kev 10:27:04
SaltyBones: You're assuming that a server is happy to have arbitrary client-provided IDs as the primary query into the archive. I'm suggesting I don't think that's valid.
Kev 10:27:19
I think you have to let the server decide how it indexes its database.
SaltyBones 10:27:54
Kev, they are not arbitrary at all because the server can verify that the client generated them correctly. Essentially they are just forced to use the same generation algo....
SaltyBones 10:28:22
Of course if there are servers that don't like this algorithm for some reason than we should look at what that reason is and how to fix it. :)
jonasw 10:28:50
SaltyBones, the reason for prosody/Zash is clear: their ID contains the date because that allows quick access to a bucket of MAM data
goffi 10:28:52
SaltyBones: and what was the conclusion of the discussion? Can we re-use? Tobias: I don't see any licence mention on the wesite, what is it?
Tobias 10:29:43
I'm sure you can translate it when referencing back and mention the original author
SaltyBones 10:30:56
goffi, translations would be appreciated and then linked to from the main newsletter. Not a real legal discussion.
goffi 10:34:52
Tobias: SaltyBones: I plan to translate in French to a popular website, but it require to specify licence, so I want to be sure I can set CC By-SA there. And yes I'll mention original post of course.
Tobias 10:35:17
what's the SA?
goffi 10:35:22
Share Alike
SaltyBones 10:35:46
goffi, join commteam@ and ask there.
goffi 10:37:01
SaltyBones: Tobias: asking there now, thanks
SaltyBones 10:37:06
jonasw, they could just include a timestamp for timestamp queries and use an increasing salt to have an increasing index...
SaltyBones 10:37:45
We can probably suss out all of these problems but I am not sure anybody but me is actually interested in doing that. xD
Ge0rG 10:38:12
> but I am not sure anybody but me is actually interested in doing that. I know that feeling. Too well.
Kev 10:39:46
I'm glad to have this discussion. I'm not convinced that the proposal, and the complexity that goes with it, is solving any problems that a simpler solution doesn't.
jonasw 10:39:50
SaltyBones, but timestamps aren’t monotonic
Kev 10:40:32
That is, it seems to me that the problems solved by this solution are the same solved by just saying to clients 'be unique in origin-id', and updating other XEPs to say 'reply with the origin-id' (LMC, Receipts, etc.).
jonasw 10:40:42
yeah
Ge0rG 10:40:50
SaltyBones: sorry, I didn't even read through the message-IDs thread due to lack of time.
SaltyBones 10:41:20
Kev, and maybe adding a reply "this is your mam-ID" to client messages
SaltyBones 10:41:40
jonasw, they aren't? :)
Kev 10:42:17
Or ignoring origin-id, using the stanza ID the way we always have, and killing MUC with fire :)
jonasw 10:42:58
SaltyBones, clock corrections make it non-monotonic. and of course there’s the issue of clock sync between nodes
Ge0rG 10:43:01
Kev: or just finally mandating that MUC must keep message IDs.
Kev 10:43:13
Or that.
Ge0rG 10:43:34
And mandating that clients which want to employ LMC and other references must use sufficiently unique IDs
SaltyBones 10:43:49
jonasw, ah I was thinking about the lamport kind of timestamp
jonasw 10:43:58
that sounds complex
Ge0rG 10:44:24
Maybe somebody wants to resurrect the Jul 2014 thread on MUC message IDs.
SaltyBones 10:44:25
a little
SaltyBones 10:46:25
jonasw, anyway if you have server generated timestamps and a monotonic counter the mapping should be pretty easy although not a NOP :)
Kev 10:46:56
You need more than monotonic, don't you?
Ge0rG 10:46:58
SaltyBones: but clustering!
Ge0rG 10:47:11
no wait, that was Dave's text.
Ge0rG 10:47:15
but race conditions!1!
SaltyBones 10:48:41
If you want clustering and query by timestamp I suppose you need the opposite of monotonic.
SaltyBones 10:48:54
You want to merge archives by timestamp. Although that sounds dubious.
SaltyBones 10:52:23
So, let's suppose that servers really want to generate their own ID for internal use. That seems fair but then why does the client need to know this ID for MAM? That's a bit fishy imho...
Kev 10:53:20
Because it's that ID that is the index into the archive.
Ge0rG 10:53:40
I wonder if we can do XMPP over that link: http://www.vodafone.com/content/index/media/vodafone-group-releases/2018/vodafone-and-nokia-to-create-first-4g-network-on-moon.html
SaltyBones 10:53:43
but why does the client need to know that
Kev 10:53:58
Because it queries the archive.
SaltyBones 10:54:12
To do what?
SaltyBones 10:55:29
I mean, there are obvious solution here as well: 1. The client query the server by its own ID and the server can try to use that as an additional index or 2. The server could just reply to client messages with the new ID.
SaltyBones 10:56:01
But the whole situation is weird...what are the clients trying to achieve by querying the MAM?
Kev 10:56:19
There is some massive logical disconnect here.
Kev 10:56:41
You're asking why a user would want to retrieve messages from their message archive.
Kev 10:56:52
What's the point of having the archive if you *can't* query it?
SaltyBones 10:58:30
But why would you need to know what s in the archive to query it
Kev 10:58:54
Have you read the XEP? :)
SaltyBones 10:59:15
Well, I ve tried... :)
Kev 10:59:53
You say things like "Give me everything since message X", where X is the MAM ID.
SaltyBones 11:01:26
Yes, but now we are asking the client to query by and ID which the server generated and didn't tell it about...
SaltyBones 11:01:59
Why not just query by the clients ID or timestamp?
Ge0rG 11:02:20
timestamps are unreliable
jonasw 11:02:33
SaltyBones, because it is exact
Kev 11:02:34
Syncing on timestamp doesn't work, indeed.
jonasw 11:02:39
timestamps are not exact
Kev 11:03:01
And the client ID means you're storing the primary index provided by the client, which enforces implementation details on the server that I'm not convinced we want to.
Kev 11:03:21
And maybe it's something we can live with, but I don't currently see what it buys us.
jonasw 11:03:51
I know at least one implementation which can’t live with that :)
Holger 11:04:19
The message in question might be an incoming message, an outgoing message sent by that client, or an outgoing message sent by another client. You'd use the client ID in some or all these cases?
Kev 11:04:30
jonasw: If you mean Prosody's timestamp one, I think additional stuff's going to end up needed there anyway, for all the other things we were discussing at the summit.
Kev 11:04:32
But yes.
Kev 11:04:57
Holger: Well, you obviously can't in all, for id clashing reasons. At least, not with this suggested scheme.
Holger 11:05:08
That's why I'm asking.
Holger 11:05:26
So basically the client ID is not an option.
Holger 11:05:34
If you don't want to introduce another great mess.
jonasw 11:08:46
Holger, that’s a very good point, I like it :)
Ge0rG 11:09:26
another great mess! \o/
Neustradamus 11:16:32
It is beautiful to see the first newsletter after XMPP Roundup and Jabber journal :)
Neustradamus 11:17:09
I see a new redirection problem: http://wiki.jabber.org/index.php/....
Holger 11:18:35
I'm confused. Say the client re-logs in after loosing the connection and knows the MAM ID not just of the last incoming but also of the last outgoing message because we solved that somehow. He then queries MAM with after=$ID. How does he decide whether to specify the $ID of the last outgoing or the last incoming message? The ordering of incoming vs. outgoing messages on the client side might be different from the server side, no? (Maybe *this* can only be solved properly by having the server reflect IDs?)
jonasw 11:19:17
yes, that can be solved by having the server reflect IDs
Ge0rG 11:19:27
Holger: that's an awesome point
Ge0rG writes it on the back of his "race conditions" card 11:19:46
jonasw 11:19:49
hah
jonasw 11:19:59
that’s why I think we just want self-carbons by now.
Kev 11:19:59
It's the same point I made on the list earlier this morning.
jonasw 11:20:03
yeah
Kev 11:20:06
But with more words ;)
Holger 11:20:16
Kev: Oh sorry, I didn't catch up yet.
Kev 11:20:38
The chat in here was triggered by me replying to the mailing list thread, I think.
jonasw 11:20:43
yeah
Ge0rG 11:20:52
Kev: except almost nobody read your mail, it seems
SaltyBones 11:21:03
The ordering or messages in MAM can be different from the client?
Holger 11:21:10
Can we maybe somehow merge reflection of IDs with 0198 ACKs?
Ge0rG 11:21:15
SaltyBones: yes
Ge0rG 11:21:32
Holger: this is something I proposed when MAM first appeared.
Kev 11:21:32
Holger: I'd rather not, that's somewhat breaking layering.
Ge0rG 11:21:48
Holger: 0198 and Carbons and MAM in a single unholy union.
SaltyBones 11:21:58
Why can they be different and who is right? :)
Holger 11:22:01
Kev: Then the layers are wrong IMO.
Kev 11:22:37
Really?
Holger 11:22:38
Kev: I find it a bit embarrassing to define a protocol that has the server generate two responses to a single message.
Kev 11:22:47
198 is about the network layer and stuff getting through.
Kev 11:22:51
MAM is about the protocol layer.
jonasw 11:22:57
Holger, two responses?
Zash 11:23:11
198 isn't per message
jonasw 11:23:18
not even per stanza, indeed
Holger 11:23:29
jonasw: (1) ACK I got message with ID $count, (2) ACK I got the message with MAM $ID.
jonasw 11:23:48
Holger, but the (1) ACK is explicitly requested by the client with an <{sm}r/>
Holger 11:23:57
Yes I'd usually request it per-stanza.
jonasw 11:24:05
hm, I don’
jonasw 11:24:07
*I don’t
Kev 11:24:08
So 198 could be bunching a load of stuff, for different stanzas (which may or may not be messages), and is generally going to happen once the server receives the stanzas, whereas 313 happens later, once it routes goes in the archive.
Holger 11:24:19
jonasw: If the client doesn't deem this necessary, why does it deem it necessary for MAM?
Holger 11:24:35
Ge0rG: Yay I'm good at re-inventing wheels.
Ge0rG 11:24:36
Kev: some servers will only emit the 0198 ack after fully processing the stanza
jonasw 11:24:41
Holger, because MAM contains things from other entities I suppose
Ge0rG 11:25:00
yaxim will emit an <r/> after each message because mobile is unreliable
jonasw 11:25:09
on the XEP-0198 stream, I know the order of the stanzas. In MAM, I don’t unless I get an in-order reflection of my own message.
SaltyBones 11:26:05
If a client queries the MAM by last ID but the order in the MAM might be different I don't understand how it works. :)
Holger 11:26:22
Ge0rG: And if it was reliable you wouldn't need 0198. I never got the idea of requesting an ACK only every now and then.
Holger 11:27:05
Except for requesting it only once per bunch of stanzas you sent in one go, or so. In which case a single MAM ID response is fine as well.
jonasw 11:27:07
Holger, when sending a bunch of messages at once, it makes sense to request an ack only after the last message
jonasw 11:27:11
saves overhead
jonasw 11:27:12
yeah
jonasw 11:27:28
(or rather, sending a bunch of stanzas in general)
Holger 11:27:41
Yes what I don't get is how the requirements differ from those for MAM ID reflections.
jonasw 11:28:28
Holger, hm, maybe that would work.
jonasw 11:28:41
still requires some kind of knowledge about the relative ordering of messages you sent vs. messages you received and other resources sent
jonasw 11:28:49
I don’t think we can get that without something on the stanza layer?
Holger 11:28:50
Why?
jonasw 11:28:56
to be able to query correctly
Holger 11:29:12
The ordering is now defined by the order or stanza IDs you got on your incoming stream.
Holger 11:29:18
*the order of stanza IDs
jonasw 11:29:19
how would I know when I send and receive a stanza at the same time and then my connection drops without stream management.
jonasw 11:29:30
now I need to query the archive
jonasw 11:29:42
which of the two IDs do I use to get a complete, dupfree picture?
Holger 11:30:27
You ditch unacknowledged messages locally and query MAM with after=$ID, where ID is the last ID you got from the server, no?
jonasw 11:30:30
(I’m currently too hungry to think of a more sophisticated case where using the wrong ID would actually lead to *missed* messages, but there might be some)
jonasw 11:31:02
okay, I need some food first
jonasw 11:31:08
food for thought, if you will.
Holger 11:31:20
I'll try coffee.
Holger 11:31:46
And I'll read Kev's email :-)
flow 11:47:25
MAM IDs in SM acks seems to be worth exploring
MattJ 11:50:44
Depends whether you want to communicate to the client "this was the last entry in MAM at this point in the stream", or whether you want the client to know the ID of every message in the archive
Ge0rG 11:51:01
Holger: except we need SM for IQs as well.
flow 11:51:39
MattJ, last entry in archive should be sufficient for most cases
Ge0rG 11:51:44
MattJ: what about giving back a list of message id / message ID pairs.
Holger 11:52:27
Ge0rG: Those will obviously not carry a MAM ID?
MattJ 11:52:56
I'm guessing Ge0rG means a map of @id -> MAM-ID
Ge0rG 11:53:25
I just love our nomenclature
MattJ 11:54:08
What nomenclature?
MattJ 11:54:15
Nobody can agree on what to call anything :)
Ge0rG 11:54:38
Can't we just map jabber IDs to nonza IDs and be done?
Holger 11:54:45
:-)
Ge0rG 11:55:02
This protocol is not for zimpies™
Holger 11:55:17
While at it, maybe just fix 0198 to return an ID for every stanza and ditch both <r/> and the counting which nobody gets right anyway :-P
Ge0rG 11:55:35
Holger: but layers!
Holger 11:56:09
Ge0rG: The TCP layer is responsible for reliable message delivery.
Ge0rG 11:56:25
Holger: no, TCP is a byte stream, not a message stream.
MattJ 11:57:34
I think 198 is fine as-is, and I'm not keen on extending it
MattJ 11:57:54
But yes, we do need to solve the MAM-ID-for-outgoing-messages problem
Ge0rG 11:58:00
MattJ: from a smart-server dumb-client point of view, having four different mechanisms to track messages sucks.
Holger 11:58:13
So then we need a separate acknowledgment for MAM.
Ge0rG 11:58:17
I'm talking of 0184, 0198, 0280 and 0313
MattJ 11:58:28
Typically consensus has been about reflecting outgoing messages (in part, or in full), because this also has other benefits and we do it in Carbons anyway (just not for the originating resource)
Ge0rG 11:58:39
MattJ: yes.
Ge0rG 11:58:53
I wonder how that will play out with self-messages.
MattJ 11:59:19
Heh
Holger 11:59:26
I understand where you guys are coming from, I just think this adds a bit embarrasement when you show the procotol to a newcomer for the first time.
MattJ 12:00:02
Holger, and your preferred solution is?
MattJ 12:00:20
Oh sorry, you already said
Holger 12:00:22
MattJ: Merging 0198 ACKs with 0313 message reflections.
Ge0rG 12:00:22
Holger: XMPP is already mocked by HTTP developers. It can't get any worse.
Holger 12:03:34
But I see how just adding a stanza-id attribute and otherwise keeping 0198 as-is has downsides. So yes just adding a 0313 mechanism is probably an easier way forward.
jonasw 12:04:10
FWIW, I think keeping 198 and MAM IDs separate is sane separation of concerns
MattJ 12:04:12
For the most case I don't think we should be introducing newcomers to the protocol
Holger 12:04:12
It's just like other things where the end result is more convoluted than it would be if we addressed all this sync foo in one go.
MattJ 12:04:29
It's a library problem, and the problem is most libraries just leave you with stanza building
Ge0rG 12:04:37
with 0313 reflections we probably don't need to <r/> each message any more
Holger 12:04:39
We need new libraries!
Ge0rG 12:04:47
We need more libraries!
Ge0rG 12:04:56
There are only three(?) for python!
jonasw 12:05:10
i need to port poezio to aioxmpp, then there’ll be only one *evil laughter*
Ge0rG 12:05:43
jonasw: python-nbxmpp!
jonasw 12:05:58
that does barely count as library
Ge0rG 12:06:26
and sleekxmpp used to be a parent of slixmpp? There is also xmpppy
Holger 12:06:32
Ge0rG: Well what we really need is new library authors I guess, and at least those will have to be introduced to the protocol. In practice you'll have to understand most of that stuff as a serious client author as well, of course, even if your library is sane.
Ge0rG 12:06:33
So we have five.
Ge0rG 12:06:56
Holger: as it happens, the most active libraries are maintained by client devs.
Holger 12:07:03
See.
Ge0rG 12:07:05
We have much NIH here.
jonasw 12:07:06
I wonder why ;-)
Holger 12:07:57
So I'm not fully convinced that "but libraries!" is a good excuse for adding insanity to the protocol.
jonasw 12:08:01
I can at least argue that I didn’t NIH, there simply wasn’t anything for python3-asyncio when I started. And I even considered porting sleekxmpp to asyncio, but I thought this to be not reasonably possible.
jonasw 12:08:12
Holger, as both library and client author, I agree.
jonasw 12:08:21
but I think keeping SM and MAM separate is sane.
Ge0rG 12:08:21
jonasw: you are biased.
jonasw 12:08:30
Ge0rG, why?
Ge0rG 12:08:32
I also think that keeping SM and MAM separate is good.
Ge0rG 12:08:46
And I argue in favor of a new type of session, which is MAM-Sub
jonasw 12:08:57
yeah
jonasw 12:09:02
that’d be great
Ge0rG 12:09:40
> Still, I like the idea of MAM subscriptions as a replacement or augmentation for carbons Saying that for three years now.
jonasw 12:09:54
Ge0rG, implement it pls
jonasw 12:10:08
although bind2 will probably do pretty much that?
Ge0rG 12:10:34
jonasw: bind2 is just a mechanism to carry things.
jonasw 12:10:43
bind2 would have the effect of MAM-Sub though?
Ge0rG 12:10:47
jonasw: nope
jonasw 12:10:55
by doing MAM sync and carbon enablement in a single atomic step?
Ge0rG 12:12:09
Let me make a strawman proposal of MAM-sub: - you initiate a bind2 session, supplying the last-known MAM-ID - the server doesn't deliver offline messages - the server delivers your pending MAM messages - the server auto-enables carbons and mam-reflections to you, starting to deliver everything after the MAM sync as "live"
jonasw 12:12:30
yeah
Ge0rG 12:12:32
so MAM-Sub is like Carbons but with mam-reflections
jonasw 12:12:49
alternatively, the server could just give you the current last MAM ID so that you can do the sync asynchronously
jonasw 12:12:54
while already receiving live messages
Ge0rG 12:13:07
I'm sure I proposed that and a bunch of other nifty optimizations (0198 auto-resume/start in bind2) on the ML some time last year
jonasw 12:13:18
yo
jonasw 12:13:25
we just need implementations.
Ge0rG 12:13:28
jonasw: processing MAM after live will be a pita, but okay.
jonasw 12:13:34
depends on your client, I guess
jonasw 12:13:37
I’d be fine with that.
jonasw 12:13:58
has a considerable latency advantage, especially if you’ve been out for more than just a few hours
flow 12:14:11
Ge0rG, do mam-reflections solve the issue Holger described between incoming and outgoing messages?
Ge0rG 12:14:39
flow: yes.
Ge0rG 12:14:59
flow: MAM reflections will be part of your MAM archive, right between incoming messages, properly ordered.
jonasw 12:15:01
we just need to make sure that MAM-Reflections don’t rewrite IDs. this time for real *scnr*
jonasw 12:15:09
Ge0rG, what, why?
jonasw 12:15:15
wouldn’t you just have your outgoing messages in the MAM archive?
flow 12:15:31
Ge0rG, so mam-reflections are done after the message has been added to your archive, both incoming and outgoing messages
Ge0rG 12:15:39
flow: yes
Ge0rG 12:15:45
jonasw: I'm not following
flow 12:15:45
sounds good
flow 12:16:06
you should write a XEP
flow 12:16:18
or otherwhise the idea will possibly be burried in the standards@ archive
jonasw 12:16:34
Ge0rG, why would you have the MAM reflection thing (presumably <message from="mam" to="your client"><forwarded><inner message from="your client"/></forwarded><stanza-id…/></message>) in the archive instead of just <inner message/>?
Ge0rG 12:17:09
jonasw: wait, what?
jonasw 12:17:14
what is a MAM reflection?
flow 12:17:19
jonasw, I don't think that is what Ge0rg wanted to say with "will be part of your archive"
Ge0rG 12:17:21
jonasw: whatever we make it to be
jonasw 12:17:41
Ge0rG, I am super confused now
Ge0rG 12:17:42
jonasw: could be a sent carbon of your outgoing message, or the outgoing message wrapped in MAM
jonasw 12:17:46
yeah
jonasw 12:17:47
okay
jonasw 12:17:54
but WHY would you put that wrapped message into MAM again?
Ge0rG 12:17:56
jonasw: or maybe just a small-ish ack with the MAM ID and your original @id
Ge0rG 12:18:01
jonasw: I wouldn't
jonasw 12:18:08
I don’t understand: 12:14:59 Ge0rG> flow: MAM reflections will be part of your MAM archive, right between incoming messages, properly ordered. this then
Ge0rG 12:18:11
jonasw: I'd put the original message, obviously
Ge0rG 12:18:35
jonasw: ignore it please
jonasw 12:18:37
okay
jonasw 12:18:51
then I didn’t say a thing since 12:15:01Z
Ge0rG 12:18:56
jonasw: of course your *sent message* will be part of your MAM archive, plus the MAM-ID
jonasw 12:19:04
yeah, that’s a good thing :)
flow 12:21:16
Ge0rG, once MAMSub is active, clients will only receive messages not stored into mam via the usual way, all other archived messages will be mam-reflected, correct?
Ge0rG 12:22:27
flow: wait, what?
flow 12:24:43
Ge0rG, specific mam-reflections please
SaltyBones 12:25:05
I find our reasoning so far somewhat questionable. Because servers might want to use different IDs for messages these IDs should be reflected to the client so that it can make queries with that ID. Shouldn't a simply be able to respond to a clients query if the client uses its original ID? Maybe this is not really practically possible anymore now but it seems somehow more logical. :) ✎
Ge0rG 12:25:18
flow: no, you will receive all messages as usual, with MAM-IDs injected
SaltyBones 12:25:20
I find our reasoning so far somewhat questionable. Because servers might want to use different IDs for messages these IDs should be reflected to the client so that it can make queries with that ID. Shouldn't a server simply be able to respond to a clients query if the client uses its original ID? Maybe this is not really practically possible anymore now but it seems somehow more logical. :) ✏
flow 12:26:02
Ge0rG, and carbons still using forwarded?
Ge0rG 12:27:16
flow: let me sort this out: you will receive all(*) incoming messages as regular messages, sent carbons from your other clients as sent carbons and MAM reflections of your outgoing messages as whatever works (e.g. forwarded)
Ge0rG 12:28:08
all(*) = remember what I proposed at the summit / XMPP2 / routing2 brainstorming
Ge0rG 12:29:18
But maybe routing2 is still too controversial
flow 12:29:46
What gives you the impression that it is too controversial?
Ge0rG 12:30:03
flow: it breaks existing XMPP routing
flow 12:31:04
But it's opt-in. Do we have an example of a legacy protocol which breaks when another involved entity activated routing2?
Ge0rG 12:31:51
flow: not that I am aware of. But the proble is that it changes semantics of bare/full JID routing, which is guaranteed to leak outside the XMPP2 domain
flow 12:34:21
I guess that is just a fancy expression for "something could rely on the semantics and would break if someone else is using routing2"
MattJ 12:34:29
The root problem is that an XMPP1 entity will happily send to the full JID of an XMPP2 entity and expect it to be treated in the XMPP1 way
Kev 12:35:04
Full-JID 'I'm xmpp2' annotation seems like it works though.
Ge0rG 12:35:15
Kev: maybe, yeah.
Kev 12:35:47
Or heuristically 'fixing' xmpp1 full JID based on DPI, but that seems unappealing and probably fragile.
flow 12:36:10
Kev, DPI?
Kev 12:36:18
Deep Packet Inspection.
flow 12:36:20
deep packet inspection?
Kev 12:36:35
Which I'm abusing as a term to mean 'look inside the payloads'.
MattJ 12:41:52
The problem with the annotation is that it feels like it's undermining the point of routing2 in the first place
MattJ 12:42:58
If we're going to annotate, let's just annotate and we don't need to make any other changes
MattJ 12:43:17
and that's basically hints, but with a stronger definition of how to process them
Kev 12:44:30
MattJ: Maybe, perhaps.
Kev 12:45:07
What else would you annotate, though?
MattJ 12:46:13
Exactly - nothing
MattJ 12:46:46
So XMPP2 is XMPP1 with annotations on stuff you want to treat as ephemeral
Kev 12:47:41
And changed routing rules for anything that's not annotated?
Kev 12:48:12
And clients need to know to ignore anything in an annotated message, because it shouldn't be getting them.
Kev 12:48:38
I don't know, I'm kinda concerned that people are going to go down the "Oh, let's annotated such-and-such special casing" like we have with no-copy and no-store at the moment.
MattJ 12:48:39
Indeed
Kev 12:48:48
In principle, it's technically equivalent.
Kev 12:49:24
I still feel we might want to start sending messages from the bare JID instead of the full JID.
MattJ 12:49:39
from or to?
Kev 12:49:43
From.
MattJ 12:49:47
I hadn't considered changing from
Kev 12:49:48
You certainly want to be sending to the bare JID.
Kev 12:50:22
Although if we're saying "treat all messages to a full JID as to a bare JID unless they have an ephemeral annotation", that may be reduced, I guess.
Maranda 12:51:28
Hmm conversations lost the message I sent here this morning from the backlog hm hm
Kev 12:51:31
I need to find time to write some specific words here, so we can bash them, I think.
Ge0rG 12:51:45
https://marc.info/?l=openbsd-misc&m=151974573718360&w=2 - Alright, I'm not complaining about XMPP protocol design any more
jonasw 12:55:45
Ge0rG, lol
Ge0rG 12:58:41
Yup. OpenSSL. Still written by monkeys.
moparisthebest 13:07:26
Random question without much thought, why can't the one true message id be implicit as the hash of the whole stanza?
SaltyBones 13:08:12
It's hard to define what should go into the hash and then some "things" change those things anyway so the ID would change...
moparisthebest 13:08:45
But isn't just hash the entire thing good enough?
SaltyBones 13:09:35
moparisthebest, yesterday I would have said yes but now it is clear that people don't just want unique IDs they want very specific IDs and pick them themselves.
moparisthebest 13:09:50
They can still do that
SaltyBones 13:10:13
moparisthebest, just read the backlog from today :)
moparisthebest 13:10:25
The public one is the hash, they can use whatever as the private one
moparisthebest 13:10:57
Encoding implementation decisions into the protocol seems wrong
moparisthebest 13:11:09
Especially when it has loads of downsides
MattJ 13:11:24
moparisthebest, hashing XML is problematic, and in any case the same stanza may change en-route
MattJ 13:11:54
Unless you're saying it's just hashed after the first hop
goffi 13:12:45
I was thinking about hash too, but the issue with hash is that you have to find one without collision. If you change, you'll break all existing ecosystem.
SaltyBones 13:13:30
moparisthebest, the problem is that we leak the private one because it is required for MAM queries.
SaltyBones 13:13:38
see my 13:25 comment :)
Kev 13:17:47
moparisthebest: Stanzas might change at every hop. So you can't just hash the whole thing.
Ge0rG 13:18:24
moparisthebest [14:11]: > Encoding implementation decisions into the protocol seems wrong Hashing parts of a message into its id is just that
Kev 13:18:26
goffi: Hashes changing is a solved problem, at least, you just specify the hash used.
Ge0rG 13:19:28
We could just replace messages with their cryptographic ids and become the next peer to peer distributed content storage network
goffi 13:19:48
Kev: yes, but what for already emitted IDs ?
Kev 13:20:02
goffi: They don't have an embedded scheme.
moparisthebest 13:26:37
Ge0rG, I'm not saying hashing parts, that gets complicated, I'm saying hash the entire thing
moparisthebest 13:27:01
as to changing at server hops, doesn't the id only matter between client and their server?
moparisthebest 13:28:55
so my client sends a message which I know has id AAAA because that's the hash
moparisthebest 13:29:15
my server can change it, if it does, it sends me back a message telling me AAAA is now BBBB
moparisthebest 13:29:41
does that not solve everything?
Kev 13:29:52
~~No the idea matters between endpoint entities.~~ ✎
Kev 13:30:01
No the id matters between endpoint entities. ✏
moparisthebest 13:30:39
if any server can change the message, the id can only matter between a client and their server?
Kev 13:31:19
No, the id matters end to end.
Kev 13:31:25
But the content of a stanza might be changed at any hop.
Kev 13:31:35
So hashing to generate an id doesn't work.
moparisthebest 13:31:46
so now you might have an id that is the same and different contents?
moparisthebest 13:31:50
what's the point exactly?
Kev 13:32:02
See <delay/> for an obvious application.
moparisthebest 13:32:43
what's the point in a@a.com having the same id as b@b.com ?
moparisthebest 13:32:59
surely it only matters between a@a.com and a.com, and b@b.com and b.com ?
Kev 13:35:37
Because otherwise you can't reply to previous messages, which you obviously need to be able to do.
moparisthebest 13:36:29
wait where is this concept of replying to a specific message?
moparisthebest 13:36:39
as far as I'm aware, there is just a guaranteed order and that's it
Kev 13:37:29
In assorted XEPs.
Kev 13:37:41
LMC, Receipts, ...
Ge0rG 13:38:37
You'd have to track the message id associations for all eternity.
moparisthebest 13:40:10
it sounds flawed at a basic level, in a federated system, where a message can change at any server hop, how can you expect an id to refer to remotely the same thing on opposite servers?
moparisthebest 13:40:47
but really hashing would still work right? the server knows the incoming hash and the outgoing hash
moparisthebest 13:41:02
when it gets a read receipt etc etc, it just reverse maps it on the way out?
SaltyBones 13:41:45
Ge0rG, you need to track that anyway because read receipts, right?
SaltyBones 13:41:53
I mean the client has to do it not the server but still..
Ge0rG 13:42:47
SaltyBones: the client won't know the effective id between server a and server b
moparisthebest 13:43:01
and doesn't need to Ge0rG ?
Ge0rG 13:43:17
moparisthebest: yes, you need to reverse map, for the lifetime of the message. Which might be months.
moparisthebest 13:43:20
a@a.com can't talk directly to b.com
moparisthebest 13:43:32
sure
Holger 13:43:48
moparisthebest: Message contents aren't unique so you can't use a hash as an ID.
jonasw 13:44:36
inb4 nonce
moparisthebest 13:45:01
Holger, I don't understand what you mean, I mean hash an entire stanza for an id
jonasw 13:45:02
moparisthebest, you can’t reasonably expect a re-writing server to map between IDs for all eternity, *plus* to know all protocols where message IDs may be referenced
moparisthebest 13:45:22
if you rewrite, you map, easy
moparisthebest 13:45:35
shouldn't need to know any specific protocols?
Holger 13:45:37
moparisthebest: Entire stanza contents aren't unique either.
jonasw 13:46:08
moparisthebest, a server would have to re-write a clients reference to another ID, e.g. for XEP-0184 (reciepts) or Last MEssage Correction.
moparisthebest 13:46:19
Holger, I don't understand, if 2 stanzas hash to the same id they are the same stanza
Holger 13:46:31
moparisthebest: But they are still two stanzas.
jonasw 13:46:31
moparisthebest, but maybe sent at a different time.
moparisthebest 13:46:40
no they are just 1 stanza
jonasw 13:46:41
like "that’s a good idea", I send that maybe ten times a week in here
jonasw 13:46:59
a stanza is always its context
Holger 13:47:01
moparisthebest: Hah what?!
jonasw 13:47:03
(aaand here we are at matrix’ DAG thing)
Maranda 13:47:06
Do I have to mention what kind of hard fail hashes are? Does DKIM ring a bell?
moparisthebest 13:47:14
storage-wise you store them once etc?
jonasw 13:47:17
Maranda, yeah, good point
jonasw 13:47:21
oh my god
Holger 13:48:38
moparisthebest: "etc" :-)
Holger 13:48:55
moparisthebest: What jonasw said.
moparisthebest 13:49:00
ah you are saying if you send the exact same stanza every day or something, got it
moparisthebest 13:49:41
but maybe the rest would work? if id's are only valid per-hop
moparisthebest 13:50:15
and anything rewriting the id keeps a map for as long as needed?
jonasw 13:51:31
moparisthebest, how long is "as needed"?
jonasw 13:51:32
eternity?
moparisthebest 13:51:50
well for a server it'd be as long as it had the message
jonasw 13:51:53
especially with non-IM use cases I can imagine "as long as needed" can be quite a while
Holger 13:51:54
moparisthebest: So you're hashing contents and keeping a map because the contents of a given stanza may change and ignoring the fact that different stanzas might have identical stanzas. But apart from that hashing contents sounds like a perfect solution yes.
moparisthebest 13:51:58
in mam/smacks/offline storage
jonasw 13:52:02
moparisthebest, what about references to that message?
Holger 13:52:07
*identical contents
moparisthebest 13:52:24
Holger, no I've scrapped hashing :P
Holger 13:52:30
Ah.
moparisthebest 13:53:02
jonasw, surely those are only good while there is something to reference?
jonasw 13:53:33
moparisthebest, another server might have a MAM for longer than you do
Maranda 13:53:33
moparisthebest, good boy that's for the best 🤗
Maranda 13:54:08
(scrapping hashes)
moparisthebest 13:54:46
Maranda, to be fair that was my first question (if you want to uniquely identify a message why wouldn't hashing work?)
moparisthebest 13:55:08
to which the answer was, we need to uniquely identify a message as well as it's position in the stream
Kev 13:55:38
There were many answers, and I think that's one of the few things that wasn't an answer.
moparisthebest 13:55:39
jonasw, in which case it has a map?
Maranda 13:55:47
I guess that was plentily answered already by multiple sources about the "why"
moparisthebest 13:56:11
the answers I saw about hashing were 'well you have to decide what to hash' which is different
Maranda 13:57:33
Beside that attempting to reinvent "a DKIM version" for stanzas/xmpp is a horrifying thought 😘
jonasw 13:58:01
Maranda, but SPIM!!kk
moparisthebest 13:58:30
dkim is an entirely different solution for an entirely different problem
moparisthebest 13:58:38
that xmpp already has solved from day 1
moparisthebest 13:58:55
(that problem is, is server X allowed to send messages for domain Y)
Ge0rG 13:59:01
Can't we just store the message content on the blockchain and only exchange message IDs?
moparisthebest 14:00:51
so what's the problem with a simple solution like, client sets id, if any server changes it it keeps a map, the end?
moparisthebest 14:01:26
servers don't *have* to change it, but if they do, they keep a map
Maranda 14:07:48
moparisthebest, sorry to contradict but that's not what DKIM ultimately *does*, not that it's important though.
moparisthebest 14:08:17
that's the goal isn't it Maranda ?
Maranda 14:09:16
Nope DKIM is more about message authentication, contraffaction and tampering prevention.
moparisthebest 14:10:02
I think it has that side-effect because of the hashing, but the goal was server-that-sent-this-was-authorized-by-domain
Maranda 14:10:35
You're confusing with SPF me thinks.. And both are failing that's why they had to invent a third, DMARC that somehow fails too.
moparisthebest 14:10:52
and there are 2 methods, SPF doesn't hash but is only useable at first hop, and DKIM hashes and is therefore useable through multiple hops (as long as no servers change the hashed part :))
Holger 14:11:10
> DKIM provides a method for validating a domain name identity that is associated with a message through cryptographic authentication. http://dkim.org/
moparisthebest 14:11:13
DMARC is not a 3rd, it's just enforcing/reporting on SPF and DKIM ?
moparisthebest 14:12:05
still unsure why we are discussing this, XMPP already guarantees the sending server is authorized by the domain
Holger 14:13:19
Someone mentioned DKIM out of the blue, someone else responded :-)
Holger 14:13:24
I'd like to get back to FTP again!
Maranda 14:13:30
DMARC, nope not just reporting, and I'd avoid reading just introductions. Saves headache laters.
moparisthebest 14:15:21
I said it was enforcing and reporting
moparisthebest 14:15:24
and that's all it is?
Maranda 14:16:46
You did? Oh you did I'm blind apologies blame the cold exposure 😆
moparisthebest 14:17:09
that's fully understandable, unfortunately :)
Maranda 14:19:22
That was a grotesque example to explain how inadeguate I think hashes are in stanzas context, no big deal anyways. Brb.
Kev 14:20:46
Holger: Servers change IDs per-hop based on a hash of the contents, store the mapping between IDs, entities fetch the mapping over FTP and do the lookups there. Address of the FTP server is stored in a blockchain.
Kev 14:21:11
You're welcome.
moparisthebest 14:22:10
I feel like you're missing an opportunity to use GOPHER in there someplace
Holger 14:22:11
Kev: Perfect!
Dave Cridland 14:23:33
Can we guarantee forward secrecy of ids?
Kev 14:23:48
Yes, but not perfectly.
Dave Cridland 14:25:03
Also, no need to use a hash. We could base64 the entire stanza into the id attribute.
Dave Cridland 14:25:19
No collisions, and no need for hash agility then.
jonasw 14:26:10
Dave Cridland, ELOOP
Holger 14:26:13
We base64 the stanza including the original id attribute and then replace it with the result?
Dave Cridland 14:26:48
Without the id attribute, obviously. It'll solve the c14n problem.
Kev 14:26:54
Holger: No, you base64 it including the *new* id.
Holger 14:27:07
Ah. Now it makes sense to me.
Kev 14:27:21
Else the stanza's changing and you'd need a new id.
Dave Cridland 14:27:27
Kev, I don't know why you're being silly. My suggestion was practical, and just as sensible as all the others.
jonasw 14:27:38
Dave Cridland, it’s possible to do
Kev 14:27:44
One of those two statements is true.
jonasw 14:27:48
since there’s no length field, you can put the resul… nevermind
jonasw 14:27:51
oh my god
jonasw 14:27:53
I really should get to sleep
MattJ 14:27:54
I just switched to this tab, I can't tell what's a joke and what's not any more
jonasw 14:28:02
MattJ, assume everything is
Kev 14:28:07
MattJ: That's the joke.
Tobias 14:28:25
and don't forget, today is opposite day :)
Dave Cridland 14:28:45
Tobias, OR IS IT!!?!??!!
SaltyBones 15:14:00
So, to bring it back a bit. It seems this view is currently popular: The server needs to pick his own IDs, the client needs to use these IDs for MAM queries and we will add reflection so that using reflection and carbons a client gets copies of all its messages to have those IDs.
moparisthebest 15:14:56
why is it more important for the server to pick it's own IDs than the client?
Kev 15:15:52
Concrete proposal: Set origin-ID uniquely on the client, add another id any time something archives it and wants it available, reflect that id back to the first client on first hop. Set the stanza id to the same as the origin-id, but generally ignore it and use the origin-id where available in things like LMC.
SaltyBones 15:15:59
I don't think it is an issue of what is more important it's just that the server writers want to do that. I am not one so I cannot tell you why. :)
moparisthebest 15:16:46
so I propose they can do whatever they want, they just need to keep a map?
Ge0rG 15:17:16
moparisthebest: keeeping a map is a hard task.
Ge0rG 15:17:37
moparisthebest: also actually replacing all references according to the map, because you don't know what's a reference and what's just random data.
moparisthebest 15:17:38
I'm pretty sure it's one of the simplest tasks in computer science
SaltyBones 15:17:52
Ge0rG, true but now we need the client to keep the map or to change the ID of old messages or something...it also seems hard :)
moparisthebest 15:17:59
it's not naming, cache invalidation, or off-by-one errors :P
Kev 15:18:19
No, it's two of those things in one :)
Ge0rG 15:18:21
moparisthebest: actually it is a part of cache invalidation.
Zash 15:18:23
Off by one naming of invalid caches
SaltyBones 15:19:00
But the point stands, doesn't the work still have to be done but now it must be done by the client?
Kev 15:19:25
No.
SaltyBones 15:19:58
It needs the origin-ID for references and the stanza-ID for MAM queries....
Kev 15:20:23
It doesn't need any mapping.
moparisthebest 15:20:30
I thought the point was to hose all those and go to 1 id ?
Ge0rG 15:20:32
and the message ID in case the origin-ID gets stripped
Kev 15:20:58
Ge0rG: I suggest (and it's a novel suggestion, because I know no-one's suggested it before) setting the id to the same as the origin id.
Ge0rG 15:21:28
Kev: you'll never get the author convinced to do that
SaltyBones 15:23:35
Kev, I'm not sure I agree that keeping two IDs for messages is very different from a mapping... :)
Ge0rG 15:24:29
SaltyBones: the client needs to keep an index on the origin-ID, but probably not on the MAM ID
Ge0rG 15:24:53
I need to look up messages by origin-ID for LMC and ACKs, but not when fetching an archive
SaltyBones 15:25:16
Don't you have to merge the local and MAM archive somehow?
moparisthebest 15:25:16
in fact I think keeping 2 id's and an index on one is the very definition of a map
jonasw 15:25:53
Dave Cridland, I sent you an email with https://github.com/xsf/xeps/pull/593 for the council agenda :/
Ge0rG 15:25:58
moparisthebest: the difference is that on the client, you store that as part of the message DB, and you can clean it up together with the messages
Kev 15:27:36
SaltyBones: These are logically two different things. You need (temporarily) a way of correlating messages and their replies ,which is origin-id. For MAM sync you only need a single ID, which is the latest thing to go in the archive that you've seen (and, depending on your model, even that is optional).
Kev 15:27:52
I don't see a situation in which you ever need to map between the two.
moparisthebest 15:28:13
so what if the client sets an id, and if the server wants to change it, it just sends that info back to the client and doesn't keep a map?
moparisthebest 15:28:26
what's the downside of a single id ?
Ge0rG 15:28:39
moparisthebest: it won't work.
moparisthebest 15:29:07
why not?
Ge0rG 15:29:31
I'm sure this has been outlined in here multiple times today
moparisthebest 15:29:37
client sends id=A, server sends back A is now B, client changes id=A to id=B, done?
Ge0rG 15:30:38
moparisthebest: what if the client disconnects after sending A?
moparisthebest 15:30:59
smacks/mam handles all that
Ge0rG 15:31:28
moparisthebest: except when you don't know that A is B now and the smacks session expires
moparisthebest 15:32:00
B still gets it with mam though
moparisthebest 15:32:09
and C who knows nothing about id=A just ignores it
SaltyBones 15:32:13
Kev, it seemed to be annoying for client devs to handle the different kinds of IDs and merge them and use the appropriate one everywhere but maybe I just misunderstood...
moparisthebest 15:32:26
it seems to be annoying for both client and server devs
Kev 15:32:57
SaltyBones: I don't think it's a significant problem, just that sometimes people aren't sure which id to use.
Kev 15:33:17
(Because everything that talks about the id was written back when there was only the stanza id, and so didn't need to be explicit which one)
Ge0rG 15:33:22
e) none of the above.
moparisthebest 15:33:22
which turns out to be the significant problem
SaltyBones 15:33:24
Yeah, maybe. From this far away I really can't tell how I would handle syncing local history with mam :)
MattJ 15:34:06
There is only one id that MAM is concerned about, ignore everything else
SaltyBones 15:36:34
Sure, so clients will store everything with origin-ID first, then add the stanza-ID when they receive the reflection and then just merge in other messages from MAM based on that, right?
moparisthebest 15:39:58
I mean when everything gets this wrong you can pretend it's not a problem, just like everyone did for years with XHTML-IM, that doesn't make it not-a-problem though
MattJ 15:41:05
SaltyBones, personally I don't really understand why origin-id exists
moparisthebest 15:41:11
and to be clear I don't mean in the future things might get this wrong, I mean essentially any newly written thing gets this wrong now
Ge0rG 15:41:32
MattJ: because of MUC reflections.
SaltyBones 15:41:37
MattJ, I think most people agree that it is superfluous and should be the same as message-ID.
MattJ 15:41:52
Ge0rG, and I think servers breaking MUC reflections are broken
Ge0rG 15:41:53
and because back in 2014, somebody refused to mandate that MUCs shall retain the ID on reflection.
MattJ 15:42:08
so lets do that now, because 99% of servers get this right
Ge0rG 15:42:18
MattJ: OK. I'll revive the thread.
MattJ 15:42:25
and stop making this id discussion so much more confusing
MattJ 15:43:14
As I recall the most vocal opponent to mandating id preservation since changed their mind
Ge0rG 15:43:28
Okay, the other reason was that on origin-id you can assume client-enforced uniqueness, which you can't on @id
SaltyBones 15:43:54
what?
MattJ 15:44:03
So then we're back to stanza-id for tracking between you and your server, and @id for end-to-end tracking
SaltyBones 15:44:06
why would a client be able to generate unique origin-ID but not unique message-ID?
Ge0rG 15:44:24
SaltyBones: a client isn't *guaranteed* to generate a unique @id
MattJ 15:44:37
Who cares?
Ge0rG 15:44:43
I don't know.
Ge0rG 15:44:54
maybe people doing references.
MattJ 15:45:06
The only entity concerned with the uniqueness of @id are clients that generate them
Zash 15:45:23
Reference them*
SaltyBones 15:45:24
~~But if it *can* generate a unique origin-ID it can generate a unique message-ID, yes? :)~~ ✎
Ge0rG 15:45:25
MattJ: clients that process incoming @id's too, for references and ACKs
MattJ 15:45:28
so if you're doing receipts/LMC/etc. then just be sure you generate sensible ids
SaltyBones 15:45:31
But if it *can* generate a unique origin-ID it can generate a unique message-ID, can't it? :) ✏
Ge0rG 15:45:45
SaltyBones: yes. But with @id you don't know from the outside, with origin-id you do
SaltyBones 15:46:04
Ge0rG, you mean it's not mandated in the spec?
Ge0rG 15:46:18
SaltyBones: no. @id is optional
MattJ 15:46:26
SaltyBones, that's the whole root of this discussion
Ge0rG 15:46:37
SaltyBones: you could send all your messages with id="badgerbadger"
SaltyBones 15:46:55
MattJ, maybe, I'm haven't been around long enough to have seen the root of this discussion. ;)
Ge0rG 15:47:24
SaltyBones: https://mail.jabber.org/pipermail/standards/2014-July/028988.html
MattJ 15:47:49
@id is optional and controlled entirely by the sending client, nobody except that client can use it for anything. If you generate an error reply, you reflect the id attribute, that's all
SaltyBones 15:47:49
Ge0rG, ...I can't even imagine that "it seemed like a good idea at the time"
Ge0rG 15:48:19
SaltyBones: MUC is full of such ideas.
MattJ 15:48:23
Second problem is that some MUC server(s?) did not preserve the id attribute in MUC rooms, which broke some clients when they received their own messages back with a different id attribute
flow 15:48:42
> Ge0rG> SaltyBones: you could send all your messages with id="badgerbadger"
flow 15:48:47
I don't think this is true
flow 15:48:59
at least for messages part of the same session
Ge0rG 15:50:00
flow: > It is up to the originating entity whether the value of the 'id' attribute is unique only within its current stream or unique globally.
Ge0rG 15:50:22
flow: now I could join this MUC with a multi-session nick, reset my session after each MUC message and you'd end up full of badgers.
Ge0rG 15:50:40
or do the same to you via type=chat.
flow 15:51:01
Right, but not within the same session/stream. I just wanted to clarify that
Ge0rG 15:51:02
@id is an end-to-end property, so binding it to the session lifetime doesn't make any sense.
Ge0rG 15:51:16
flow: technically you are correct.
flow 15:51:35
just a matter of the definiton of end"
Ge0rG 15:51:46
Winter's coming!
Zash 15:52:41
Nah, noh it ends
SaltyBones 15:53:11
So, to sum up: Add reflection of messages to make it easier to figure out how to query MAM otherwise everything has to be the way it is...?
SaltyBones 15:54:52
Maybe document better what every kind of ID is used for to make it easier for devs?
MattJ 15:56:14
As far as I'm concerned there is one id set by the client and one id set by the server
flow 15:57:45
<origin-id/> was invented mostly a workaround for the MUC reflection issue
SaltyBones 15:57:58
MattJ, ...and the client set one is origin-ID the server set one is stanza-ID.
MattJ 15:58:16
I strongly feel that origin-id should go
MattJ 15:58:27
and that any broken MUC services should be fixed
Ge0rG 15:58:53
flow: except it doesn't actually solve it, as seen with biboumi.
SaltyBones 15:59:01
Okay, so move origin-ID to message-ID and forbid rewriting.
flow 15:59:10
Ge0rG, m0ar pls
Ge0rG 15:59:34
I'll bring it up on council.
MattJ 15:59:43
flow, summary is that transports can't always preserve it
flow 15:59:46
SaltyBones, why move? just use rfc6120 IDs like before
SaltyBones 16:00:05
I mean conceptually, move the responsibility...
MattJ 16:00:07
which puts them in the "broken MUC server" category as far as I'm concerned
Kev 16:00:52
FWIW, transports don't have to be written in such a way as they lose them.
SaltyBones 16:01:16
~~Kev, what?~~ ✎
SaltyBones 16:01:27
Kev, what? parse_error ✏
Kev 16:11:19
The ids
Maranda 16:26:12
Biboumi... 🤔
SaltyBones 19:56:32
Kev, ah, you're saying that transports can always be written in such a way that they do not lose IDs?
moparisthebest 19:56:57
not if it's impossible to keep a map of IDs >:)
Kev 20:01:05
SaltyBones: At the cost of complexity.
SaltyBones 20:01:39
Sure, sure
SaltyBones 20:02:47
Yeah forbidding ID mangling seems like it would be a very same thing to do
SaltyBones 20:03:14
Should standza-IDs be transmitted btw? Or is that between client and server only?
SaltyBones 20:03:34
s/same/sane
MattJ 20:43:37
SaltyBones, the latter. It's stamped on incoming stanzas (to the user) by the server, to let the client know what id the server has assigned it (e.g. in the MAM archive)
MattJ 20:44:00
and in some version of the future, it will also be stamped on reflected outgoing messages
SaltyBones 21:09:24
And are stanza-IDs used anywhere else? They seem to be presented as this kind of general concept that can be used to add stable IDs for some sort of domain...
Kev 21:11:17
They started off as just a part of MAM. They were extracted out (probably wrongly).
Kev 21:11:30
Well, no, not quite wrongly.
Kev 21:11:50
Because they're used beyond MAM, they're also going to be used for Unread sync, and injected into Carbons and stuff.
MattJ 21:13:11
SaltyBones, originally MAM used to stamp <archived by="archive-jid" id="unique-id-for-mam-queries" />
MattJ 21:13:47
But the general feeling was that this unique id shouldn't be MAM-specific, but reusable
SaltyBones 21:14:50
Yeah, I'm just wondering if it has actually ever been used for anything and if tha anything is also between client and server or not.
Kev 21:15:22
Yes, only between client and server (or client and client)
Kev 21:15:31
Unread sync is the obvious one that it's needed.