-
lovetox
jcbrand, wanted to ping you about https://github.com/xsf/xeps/pull/1271 i added some notes after implementing, would be interested in a discussion and to bring this XEP to the finish line.
-
lovetox
Also maybe the community can jime in on https://github.com/xsf/xeps/pull/1270, where the change would result in using origin-id and stanza-id in a XEP where both are not necessary at all.
-
flow
I fear the only way for me to chime in is, to state that I really believe that we should always state the id *and* the id-assigning entity when referencing other stanzas. but judging from the past experience, that is (sadly) probably controversial
-
Zash
Namespaced IDs? I like it.
-
flow
we should really try to figure out why we have such a hard time solving the "stanza reference" issue
-
Zash
Some sort of (jid, id) tuple would be good yes
-
flow
Zash, not namespaced, qualified (by an entity)
-
Zash
flow, those concepts are very close in my brain :)
-
flow
Zash, how about: <referenced-stanza xmlns='urn:xmpp:sid:0' id='xep359-stanza-id' by='muc.example.org'/>
-
flow
Zash, granted, they are similar, and I couldn't define and state the exact difference right now
-
MSavoritias fae.ve
I agree completely about the reference. for messages too that would be nice.
-
Zash
awkward given many of the existing uses are attributes, fitting a whole tag in there would be weird, but changing the attribute would be awkward too
-
Zash
Doesn't <stanza-id> have a @by already tho?
-
flow
Zash, it does have a by attribute
-
flow
basically <reference-stanza> is mimicing <stanza-id>
-
Zash
ah, right, sensible
-
flow
but I wondered if we should design it for multi-references
-
flow
<referenced-stanzas xmlns=…><stanza-id …/><stanza-id …/></referenced-stanzas>
-
flow
that would allow to re-use your stanza-id parser
-
Zash
the wrapper could plausibly live in each XEP that uses such a reference then?
-
flow
Zash, and the advantage of doing so would be?
-
lovetox
omg, solving problems nobody has
-
flow
Zash, fwiw, <referenced-stanzas/> xmlns would be xep359 one
-
lovetox
actually ignoring problems that are pressing and lead to hard implementation and less adoption
-
flow
lovetox, let's say I would aggree: what would be wrong with the approach for the sake of consistency?
-
Zash
problem statement?
-
singpolyma
The lack of a by on replies, reactions, etc is basically the reason we have to have these rules and use <stanza-id> in muc etc
-
singpolyma
But otoh people seem fine with <stanza-id> so whatever. It's not a big deal I expect
-
lovetox
nothing flow, absolutly nothing, only that it solves a problem nobody has
-
lovetox
or do you really think when developers implement this XEP they care from which tag they parse the id attribute?
-
flow
Zash, I assume that lovetox assumes that there is only one message ID in the MUC case, which can not be spoofed
-
singpolyma
flow: no, we always use <stanza-id> in muc specifically because of this lack of by attribute / spoofing problem for message@id
-
larma
flow: the reason why we have hard times to do the referencing thing is that the referencing thing alone doesn't buy us anything.
-
flow
singpolyma, ok, and when referencing a message in MUC you want to simply use the stanza-id@id but without the by attribute?✎ -
flow
singpolyma, ok, and when referencing a message in MUC you want to simply use the stanza-id@id but without the 'by' attribute? ✏
-
larma
And then the referencing thing adds complexity cases it might not be needed or lacks features in other cases✎ -
larma
And then the referencing thing adds complexity to cases it might not be needed or lacks features in other cases ✏
-
singpolyma
flow: that is what is already done
-
lovetox
the pressing problem for me is currently, that we have an influx of new XEPs that reference something, and people have no guideline which ID they should actually reference and *why*.
-
flow
singpolyma, ok, and I see how I works in MUC cases (if you do what I assume you do). But I believe that we should simply be always include the 'by' attribute for consistency and symetry
-
singpolyma
Copy what an existing xep does?
-
larma
lovetox: we actually have a pretty good rule for that, I don't see the problem
-
lovetox
who do you mean with we?
-
lovetox
and where is this rule written down
-
larma
All the XEPs have it in a footnote
-
lovetox
care to elaborate?
-
flow
which does not seem to be the best place for a rule that appears highly relevant?
-
singpolyma
flow: if we used a by attribute we could get rid of <stanza-id> completely and just use message@id and get rid of the dependency on mam. But I think only I want that anyway so it's probably not worth it
-
flow
singpolyma, if you get rid of mam, then what's your id provider?
-
lovetox
flow, for many usecases you dont need an id provider, i guess you mean a server
-
flow
by id provider, I mean the id assigning entity
-
flow
is it the MUC service or the sending client?
-
flow
or something else?
-
larma
singpolyma: we do have clients out there that have shitty message@id generation, so no, we can't rely on it for features that need to work on messages generated by those clients
-
singpolyma
flow: sending client. As with correction xep
-
singpolyma
And as with 1:1 for other xeps
-
flow
singpolyma, then which entity ensures that the IDs are not spoofable?
-
larma
It's fine to say "you can't reply to messages of those shitty clients", but it's definitely not OK if we can't moderate them
-
singpolyma
larma: we already do everywhere but MUC.
-
larma
singpolyma: we don't moderate outside MUC
-
lovetox
flow, not sure about which spoofing you are talking, its only you your server and a contact in many cases
-
singpolyma
Oh sure, I agree moderation is different maybe because you're communicating with the MUC about its own archive. It's basically mam dependent even in concept
-
flow
lovetox, MUC case: Alice sends message A1, Bob sends +1 reaction referencing A1
-
flow
lovetox, MUC case: Alice sends message A1, Mallorey sends message A1, Bob sends +1 reaction referencing A1, but now it is unclear which message is +1'ed✎ -
singpolyma
flow: that's what adding a by attribute to the reaction solves
-
flow
lovetox, MUC case: Alice sends message A1, Mallory sends message A1, Bob sends +1 reaction referencing A1, but now it is unclear which message is +1'ed ✏
-
flow
singpolyma, and in the anonymous MUC case?
-
larma
singpolyma: let's say we have a XEP for requesting a message with all its edits at once, how would the server know which edit to attach to if we relied in message@id and by attribute and the same id appears twice?✎ -
lovetox
flow, occupant-id
-
larma
singpolyma: let's say we have a XEP for requesting a message with all its edits at once, how would the server know which edit to attach to if we relied in message@id and by attribute and the same message@id appears twice? ✏
-
flow
singpolyma, but you are right, the by attribute helps here when referencing messages
-
larma
Mallory could just use Alice's nick while she is offline✎ -
lovetox
seems you search for one thing that rules all
-
lovetox
but you simply need to categories use cases
-
larma
Mallory could just use Alice's nick while she is offline and thus have the same by ✏
-
singpolyma
larma: again, that sounds like specifically a mam extension so probably it would use mam IDs
-
lovetox
user reatract use case, there does not need to be stanza ids, its unspoofable
-
larma
singpolyma: wait, edits are a MAM extension now?
-
lovetox
message correction use case, unspoofable
-
lovetox
reactions use case, unspoofable
-
singpolyma
larma: fetching a list of things you said
-
larma
singpolyma: sure, but that needs to be possible with the information the messages contain, we don't want to have a new XEP that extends the edit xep to be compatible with the new mam feature
-
flow
larma, yes Mallory could also send while Alice is offline, but isn't that an argument to use MUC room MAM IDs when referencing messages? by using the MUC service as central arbitor for message IDs, all of those problems seem to go away. and modern MUC rooms want to provide MAM anways, then you get the MAM IDs for free✎ -
flow
larma, yes Mallory could also send while Alice is offline, but isn't that an argument to use MUC room MAM IDs when referencing messages? by using the MUC service as central arbiter for message IDs, all of those problems seem to go away. and modern MUC rooms want to provide MAM anways, then you get the MAM IDs for free ✏
-
Zash
Heh, you could even do <stanza-id> and pretend to have zero retention...
-
flow
right
-
larma
flow: I'm all in favor to use MUC stanza-id ID in MUC and this is the status quo as far as I can tell
-
larma
Or at least should be
-
flow
ok, because it appeared to me that we talking about a MAM-less MUC future for a while✎ -
flow
ok, because it appeared to me that we where talking about a MAM-less MUC future for a while. but I probably just misunderstood ✏
-
larma
What we don't need to do is to state the by attribute because that's always given in MUCs (because only the MUC's stanza-id is ever referenced✎ -
larma
What we don't need to do is to state the by attribute because that's always given in MUCs (because only the MUC's stanza-id is ever referenced) ✏
-
lovetox
by attribute is always unnecessary if we talk about XEPs where users send stuff to other users
-
lovetox
servers tell us that anyway
-
larma
For direct messages, by attribute could be relevant if we consider that a reference might point to messages of any of the two (not true for edits, but for example for reactions)
-
lovetox
dont see how it is necessary in that case?
-
lovetox
you reference a id, i attach the reaction there
-
larma
What if you and me use the same id?
-
lovetox
then i attach it to both
-
lovetox
why would i care, its a reaction from another user, if he wants sabotage himself
-
lovetox
so be it
-
lovetox
its like the question, what if the client uses 1 as id for everything and makes then, a correction for message id 1
-
lovetox
yeah .. then i correct 100 messages, not really a problem of the receiving client
-
larma
I don't want to react with a thumbs up to your message "people are stupid", I want to thumbs-up my previous message "don't be stupid"
-
lovetox
user will be annoyed to talk to other user who has a broken client, and will tell hum✎ -
lovetox
user will be annoyed to talk to other user who has a broken client, and will tell him ✏
-
lovetox
larma, i see your point, if i think about a enviroment where all kind of clients use non-stable ids
-
lovetox
you are right, in that case namespacing it with by is necessary
-
lovetox
but we are talking about 1:1 case
-
larma
I agree from that standpoint that intentionally misbehaving clients are not that much of an issue because it only breaks the conversation of those two users and that conversation is already broken if one side breaks rules in any other way, so that's something acceptable to me
-
lovetox
its evident that the client is broken, users will sort it out themself
-
lovetox
in MUC case stanza id is used, and its definitly unique, so nothing of that sort can happen
-
larma
lovetox: what is non-stable id?
-
lovetox
ah sorry, meant non-unique
-
larma
Ah, yeah, that's why Dino for example won't allow you to reply to 1:1 messages without origin-id, because origin-id explicitly requires uniqueness and suggests fully randomness
-
lovetox
yeah, and i feel thats unncessary
-
lovetox
but i respect your motivation to support older clients
-
lovetox
i actually would want to have stats which clients are really in the eco system that do this
-
lovetox
the main problem for me is currently origin-id/message-id
-
lovetox
it actually stops me from implementing more stuff
-
lovetox
because its uncertain if some people will change it in the future, and i have to deal with this later
-
lovetox
and larma, i bet you allow message corrections without origin-id
-
lovetox
so quite inconsistent for me
-
lovetox
message correction is a rather old XEP
-
lovetox
and it works only really if the client has unique ids
-
larma
Why? The ruleset is pretty easy: if origin-id is present on 1:1 messges use it for references. If not use message@id for references. Then maybe decide to not allow message@id values that seem not unique/random✎ -
larma
Why? The ruleset is pretty easy: if origin-id is present on 1:1 messges use it for references. If not use message@id for references. Then maybe decide to not allow message@id values in references that seem not unique/random ✏
-
lovetox
its not about the rule
-
lovetox
its about if people decide to change the rule in 2 years
-
lovetox
i want to have a stable enviroment where i can depend on that design choices by XEP authors / board are thought trough and not changed suddenly
-
larma
What could be a possible change of the rule that would break with that rule
-
lovetox
as i said, there are other people, that dont have your opinion that origin-id is something we need
-
lovetox
and it can go away
-
larma
I totally agree we can get rid of origin-id eventually (except for MUC reflection usecase)
-
lovetox
so you can think of something that breaks that rule
-
lovetox
:D
-
larma
I also don't want it, but shitty clients are out there and we have to handle them somehow
-
lovetox
look i dont care which camp wins
-
larma
lovetox: no, if we drop origin-id you use message@id as per the rule
-
lovetox
thats a bad rule sorry, there is no XEP in stable that mandates origin-id
-
lovetox
is it too much to ask to force a decision now
-
larma
The only things origin-id is effectively being used for is: - let others know your message@id is unique - be able to identify if a message@id was modified
-
flow
I hope we see ourselfs not as a community divided into camps that need to win, but a community striving for consensus
-
lovetox
instead of doing nothing and let authors further publish XEPs until we are in a position where it needs to be supported for all eternity
-
lovetox
you know for my implementation its not trivial to simply switch to another id
-
larma
You know that origin-id and message@id are the same on literally every implementation that has origin-id at all?
-
lovetox
yes, thats why i think about to just ignore it, and simply always reference the message id
-
lovetox
but this is a risky implementation
-
flow
*every implementation you are aware of
-
lovetox
as it seems not clear which way things go with ids
-
larma
Sure, let's change the origin-id XEP to require that
-
larma
So that we can be sure any future implementation also has this feature that origin-id = message@id
-
lovetox
would be a good start
-
flow
wouldn't that make origin-id obsolete?
-
lovetox
flow, some people want the info if a id is unique
-
larma
flow: we still need it for the two things I mentioned above
-
flow
(which wouldn't be a bad thing, I suppose)
-
larma
> The only things origin-id is effectively being used for is: > - let others know your message@id is unique > - be able to identify if a message@id was modified
-
Zash
What happens if I write a server-side plugin that strips origin-id if == message@id ?
-
lovetox
larma, the second case is also non-existent
-
larma
lovetox: MUC reflection
-
lovetox
if you think there about the MUC case
-
lovetox
and the XEP was ammended
-
lovetox
larma, are you aware of a implementation that is not 20 years old that does this actively
-
lovetox
?
-
lovetox
then lets fix it
-
lovetox
every new implementation will be compliant with the XEP
-
larma
What do you mean has been amended?
-
larma
MUC #stable-id is fully optional
-
lovetox
of course because MUC is a old stable xep
-
larma
Biboumi still does not support it afaik
-
lovetox
because it splits message right?
-
larma
Yes
-
lovetox
damn
-
lovetox
ok ignore what i said
-
lovetox
and what does it do then?
-
Zash
Tangent to that, XEP wishlist: advertising certain message limits
-
lovetox
does it invent new ids?
-
lovetox
or uses the same
-
lovetox
i wonder does biboumi need to do this?
-
lovetox
only because it splits the outgoing messages, why does it need to split them for the reflection to xmpp?
-
lovetox
but on the other side, non of these XEPs work for IRC anyway
-
lovetox
retraction, reaction, moderation, whatever
-
lovetox
and stop, you can simply use the muc feature as indication for that
-
lovetox
dont need to have this info on every message
-
lovetox
so its actually only the 1. case
-
lovetox
and there i would argue, its not worth it, in MUC we use anyway stanza id, so origin-id is not necessary
-
lovetox
and in 1:1 case its simply not important, people just migrate away from broken clients
-
lovetox
also your message reply use case, i dont see it
-
lovetox
you only allow a message reply if the other party uses origin-id
-
lovetox
but if it does not use origin-id, it will also not have support for message replies
-
lovetox
so it will not see it anyway
-
lovetox
nothing happens here
-
lovetox
your case would only be for clients who dont have unique ids, but support the most new XEPs like message replies
-
Zash
in 1:1 it should be more feasible to rely on disco#info too, but it may still be weird given multiple devices
-
lovetox
no its simply not important for 1:1
-
larma
There might be two messages with the same ID in a 1:1 chat and still multiple Clients that understand replies
-
larma
It's not like 1:1 chats only have two clients
-
lovetox
yes there can be cases constructed where something breaks, but its not worth it to add something new for that
-
larma
It's not something new
-
lovetox
larma, we have message corrections since years
-
lovetox
i heard no complaints that its totally broken because there are clients with non unique ids
-
larma
Message corrections only work on the last message, so the id doesn't really matter anyway
-
lovetox
no, thats implementation decision
-
lovetox
and we will definitly support this for not only the last message
-
larma
What most clients do is to just edit the last message with that id if there are duplicates
-
larma
Which is reasonable in the case where the XEP explicitly says it's only to be used for the last message
-
larma
But I might want to reply to the older of two messages with the same id
-
lovetox
ok so someone uses a old client, and a very new client at the same time
-
larma
Like many users
-
lovetox
and you need to reply to a message from the old client
-
lovetox
its very far fetched
-
larma
That have Pidgin on Desktop and Conversations on mobile
-
lovetox
and for that you mandate that all clients forever add a origin-id tag
-
larma
That's the two most popular clients
-
larma
How is that far fetched?
-
lovetox
cost/benefit is not there for me sorry
-
larma
For the most popular desktop client
-
lovetox
Pidgin uses maybe a non-unique id
-
lovetox
but that you hit the same id multiple times in a single chat with a contact
-
lovetox
is still very low probably
-
lovetox
at what point do you conisder the cost of the measure?
-
lovetox
if somebody can construct a single case where something would break, you add a mitigation not looking at the costs?
-
lovetox
even if i would agree, that there can be constructed a case with a used client, that happens to 1 in 1000 messages, i would still not mandate every client in existence forever, to add a origin id tag
-
larma
No, we surely should come up with rules as to when message@id is fine even if it's not an origin-id. See NY previous message.✎ -
larma
No, we surely should come up with rules as to when message@id is fine even if it's not an origin-id. See my previous message. ✏
-
larma
> The ruleset is pretty easy: if origin-id is present on 1:1 messges use it for references. If not use message@id for references. Then maybe decide to not allow message@id values in references that seem not unique/random
-
larma
I would propose to accept any uuidv4 message@id the same way as origin-id
-
MattJ
Do any of these debates have bearing on what servers will need to index for fastening-like functionality?
-
larma
So the only thing we need to do is to mandate that origin-id must match message@id if present. Then it's fine for a client to only rely on message@id but require it to be uuidv4 to do replies/reactions/...
-
larma
MattJ: for MUCs it will always be the MUC assigned stanza-id, I think everyone agrees on that
-
larma
For 1:1 it's more complicated because your index must be on (local-user's-jid+remote-user's-jid+(origin-id/message@id))
-
larma
But that's basically status quo
-
MattJ
😔
-
lovetox
i think its wrong to do all this complexity to support unmaintained client like pidgin
-
larma
(Because uniqueness can't be guaranteed on 1:1 IDs and thus must be restricted to the chat)
-
larma
What complexity? Requiring that origin-id must match message@id? Or requiring that message@id must be UUIDV4?✎ -
larma
What complexity? Requiring that origin-id must match message@id? Or requiring that message@id must be UUIDV4 for references? ✏
-
MattJ
I'm in favour
-
lovetox
the complexity in code, in my database design, when i join tables on IDs and must conisder this
-
Zash
UUIDv7 would also be nice
-
larma
For me those two rules are to reduce complexity from what we theoretically have right now.
-
lovetox
no, what we could have is easy, use always message id in single chat, always stanza id in MUC
-
larma
> the complexity in code, in my database design, when i join tables on IDs and must conisder this Don't implement origin-id. If we have those two rules you're good to go without it.
-
lovetox
yes thanks, thats why i want a decision
-
Zash
Database design is its own field for a reason.
-
lovetox
something that shows me, this is the future i can implement that
-
lovetox
not just words :)
-
larma
You will still need to take the participants into considerations when doing database joins
-
MattJ
Sorry, typing on an annoying phone keyboard. I'm in favour of anything that moves us back away from origin-id and towards a sensible place.
-
larma
Because I shouldn't be able to react to messages in your chat with Matt must because I discovered the ID of a message in that chat✎ -
MattJ
The @id is the origin id, and it's regrettable that the semantics weren't more watertight in the RFCs
-
larma
Because I shouldn't be able to react to messages in your chat with Matt just because I discovered the ID of a message in that chat ✏
-
lovetox
of course larma :) its just you dont want to have OR in your join clause, message_id = id OR origin_id = id
-
lovetox
from what i read its very inefficient and a full table scan
-
MattJ
and you're assuming an SQL database anyway
-
larma
> UUIDv7 would also be nice Agreed ;)
-
Zash
lovetox, what if you have your own internal ID, along with a lookup table for message-id → gajim-id ?
-
Zash
That's basically what we do in Prosody
-
lovetox
of course, a third column :D yes in code everything is possible, im not saying this is an unsolveable problem
-
lovetox
but having a proper decision, and not needing to consider all possible futures makes it easier to choose the best way
-
larma
So we all agree that we have to make flow adjust XEP-0359 to: - Ask specifically for UUID v4 or v7 (instead of current just any UUID) - Mandate that origin-id, if present, must match message@id if the origin is the sender of the message. ?
-
Zash
I just wish there was a more compact UUID representation
-
lovetox
yes that would be a good start larma
-
larma
Zash: isn't that up for the underlying transport/storage engone to handle✎ -
larma
Zash: isn't that up for the underlying transport/storage engine to handle ✏
-
lovetox
altough i have no opinion about the specific version of UUID, could it not be enough to say that implementors must choose a fitting UUID themself
-
Zash
larma, I'm thinking for wire protocol
-
MattJ
larma, I'd be in favour of that move
-
larma
lovetox: UUID v1 doesn't entail any randomness
-
Zash
lovetox, since you often speak of using timestamps, surely you want UUIDv7 for everything, being basically a timestamp + some random noise
-
lovetox
hmm no, i want a sequential number in a archive
-
lovetox
and timestamp generated by uuid7 certainly is somehow dependent on the computer time on the sender
-
lovetox
i currently would have no use for this
-
MattJ
Sequential counters, the easiest thing in computer science
-
lovetox
larma, but would it not be better to say, choose a uuid which has some randomness
-
lovetox
instead of mandating a specific version
-
lovetox
Zash, or do you mean that the archive uses uuid7 as stanza id
-
lovetox
could be interesting yes
-
lovetox
the question is, is uuid7 sortable?
-
lovetox
Zash are you sure it includes randomness?
-
lovetox
i doubt it
-
lovetox
> The UUIDv7 format is designed to encode a Unix timestamp with arbitrary sub-second precision.
-
lovetox
ok seems they all gurantee global uniquness
-
lovetox
so they need to have somekind of randomness in it, how this still can be sortable is beyond me
-
lovetox
ah i get it, they put the timestamp in front
-
lovetox
and hope that no entry has the same timestamp
-
lovetox
yeah so that would work if a archive simply uses that as stanza-id
-
lovetox
its a bit dependent on the correct time on the machine, but we have that anyway, as the timestamp of the message is recorded by the server
-
lovetox
so if it is wrong, it will be totally broken anyway
-
Zash
Even with unsynced clocks, you get things that are near in time closer in the index, which might be good. Or maybe it doesn't matter, I'm not a DBA 🤷️
-
Zash
Unordered things like UUIDv4 seem unfriendly to indexes
-
Zash
Tho I assume that proper databases can handle that too
-
lovetox
but this uuid7 think is mainly useful for some kind of distributed databases, where i would need some global assignment of id otherwise
-
lovetox
for a single machine storage, simply use a autoincrementing field which all databases have
-
Zash
That's what all UUIDs are designed for
-
lovetox
no need for complicated uuid7
-
Zash
*Universally* Unique IDs
-
Zash
Unique forever and troughout the entire universe!
-
lovetox
yes but archives dont need globaly unique ids
-
Zash
Pretty sure this is actually insane overkill
-
lovetox
ejabberd does this btw
-
Zash
Especially if IDs are scoped per JID
-
lovetox
it simply uses a timestamp as ID
-
Zash
( JID, 64-bit number ) or so would probably be fine
-
theTedd
Relevant to the above discussion: https://github.com/ulid/spec And https://en.wikipedia.org/wiki/Snowflake_ID is worth a look.