-
debacle
Ad break: XMPP Sprint Berlin 2024-07-12..14 in the wonderful venue of Wikimedia Germany. Please add yourself, if the probability of your attendance is at least 1 %: https://wiki.xmpp.org/web/Sprints/2024-07_Berlin#Attendees Thank you!
-
lovetox
to developers who implemented reactions, how do you store the reaction values in the database? In the case where a user attaches multiple reactions. I thought about simply join them to a string with a seperation character like ";" or something✎ -
lovetox
to developers who implemented reactions, how do you store the reaction values in the database? In the case where a user attaches multiple reactions. I thought about simply join them to a string with a separation character like ";" or something ✏
-
Zash
that sounds like a violation of one of those database design principles
-
moparisthebest
The ones you learn in database theory classes that no one uses in practice? :)
-
singpolyma
lovetox: i store exactly the same as a reply. Honestly i consider them to be the same thing with different ui
-
lovetox
singpolyma, not sure what you mean, completely different data structure, one is a simple id, the other is a list of string values
-
singpolyma
Oh I see, so you're saying how to format the row if you don't store the whole stanza. If you're in sqlite probably as a jsonb array?
-
larma
lovetox, we do a comma seperated list, it's probably not perfect db design, but makes things much easier (just have to replace that single row when there is an update, not do a diff to figure out which rows to delete and which ones to add)
- flow starts to write code to produce <reaction>,</reaction>…
-
lovetox
larma, yes that was also my thinking, of course i could do a table with reaction values, but makes updating it cumbersome, for no gain
-
larma
flow, probably will end up in empty string reactions
- Zash reacts with zero-width-non-breaking-space
-
flow
larma, I would hope that implementations reject reactions where the reaction is the chosen character to delimit reactions
-
lovetox
flow not sure what you expect to happen, you load a string from the database, call split() on it and get an array of strings
-
lovetox
if there is a empty one in there i doubt the application goes down
-
lovetox
but yeah if the developer didnt think of this possibility and it creats some exception, you can annoy someone
-
lovetox
as emojis is an expanding standard, developers who implement reactions already need to consider that they receive emojis which they dont know, or cant display because whatever gui component does not yet support it.
-
larma
flow, well, technically <reaction>,</reaction> is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set will be different, clients likely won't have a blacklist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine✎ -
larma
flow, well, technically <reaction>,</reaction> is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set will be different, clients likely won't have a blacklist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine IMO ✏
-
larma
flow, well, technically `<reaction>,</reaction>` is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set will be different, clients likely won't have a blacklist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine IMO ✏
-
larma
flow, well, technically `<reaction>,</reaction>` is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set of valid chars will change all the time, clients likely won't have a blacklist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine IMO ✏
-
larma
flow, well, technically `<reaction>,</reaction>` is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set of valid chars will change all the time, clients likely won't have a allow/denylist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine IMO ✏
-
larma
Yes, we could probably have a few strings on a denylist which we seem unreasonable to be used, and that would probably include `,` and `` and all non-printable ASCII chars, but then probably wouldn't include the zwnbsp or any other weird whitespace chars, so it will be incomplete nonetheless
-
lovetox
.. but if you dont have a complete blacklist, you can probably spare the whole blacklist
-
lovetox
the other way around is also problematic, if you check if something is an emoji, this will always depend on your libraries and how current they are
-
singpolyma
Why not Json/jsonb instead of invented encoding?
-
pulkomandy
Or ascii record separators? These are certainly not allowed as reactions?
-
lovetox
no ascii char is allowed ..
-
lovetox
its not a problem of finding a character, there are thousands you can choose from
-
lovetox
of course i can dump it also as json, but this does not prevent the problem of people sending stuff thats not an emoji✎ -
lovetox
of course i can dump it also as json, but this does not prevent the "problem" of people sending stuff thats not an emoji ✏
-
flow
fwiw, what pulkomandy suggests would work
-
flow
IIRC ascii control chars are not available in XML 1.0, so using them is possible (jxmpp-strings-testframework does exactly that)
-
moparisthebest
\0 is not allowed and makes C devs angry as a bonus
-
moparisthebest
Though... Is each reaction a single utf8 character? Because then you don't need a delimiter at all
-
Zash
It's a single grapheme cluster or whatever the term is
-
Zash
Emojis tend to be more than one code point
-
pulkomandy
Yes, and the definition of what a grapheme cluster is depends on the unicode version as more things may become allowed. So you can't really rely on it, unless the xep is updated to specify which version of unicode to use. And even then, I'm not sure there is a strict definition of "this is an emoji" in unicode?
-
lovetox
there is
-
lovetox
but as said, it does not matter, nothing bad happens if you show a user reacted with the char "A"
-
lovetox
some users will think, hm weird thats not an emoji, anyway ...
-
Zash
`<reaction>lol</reaction>`
-
lovetox
others will probably build a whole workflow on that, and then demand other clients to show also non-emojis
-
lovetox
> `<reaction>lol</reaction>` definitly can see this as a feature request, "Let us react with small abbreviations" ↺
-
singpolyma
> of course i can dump it also as json, but this does not prevent the "problem" of people sending stuff thats not an emoji No, but it prevents the problem of them reacting with something that breaks your encoding. other benefit of json or jsonb in sqlite is you can use it in queries easily
-
singpolyma
`<reaction>https://cdn.blarghl.com/gifs/sofunny.gif</reaction>` 😉
-
lovetox
singpolyma, they cannot break the encoding, they can send reaction values which lead to strings that are not emojis.
-
lovetox
but this only affects their own reaction, so other users will not see or understand the users reaction
-
lovetox
it hurts only the user itself who did this
-
rom1dep
> to developers who implemented reactions lovetox: how about a "reactions" table with 2 columns, (messageID, reaction_content)? I imagine it makes inserting/deleting/querying easy
-
lovetox
rom1dep, yes thats obvious, the question was how to store reaction_content, because its a list of strings
-
singpolyma
I'd use the built in facilities to store lists, but probably anything will work as you say
-
rom1dep
lovetox: why would you limit yourself to having a unique constraint on MessageID? Then reaction_content isn't a list any longer
-
lovetox
because users can add and remove reactions, if you are not having one row per user reaction, updating the state is more complicated
-
lovetox
and needs multiple queries
-
rom1dep
As long as they don't add/remove more than one reaction at a time, I don't see how it's not easier
-
singpolyma
They can though
-
lovetox
but they can add / remove as much as they want
-
singpolyma
Remove 6 and add 50 with one message
-
Zash
delete all for that sender/message-id and add the ones in the new stanza?
-
lovetox
and the removing is implicit through absence
-
lovetox
so its not like you can issue a delete statement for specific reactions
-
lovetox
you need to find out the diff
-
rom1dep
Even then, it's a "delete where message_id = X and reaction in (to_del1, to_del2)"
-
singpolyma
yes. two queries instead of one if you do it that way
-
lovetox
the update is one thing, but your approach would be also much worse on inserting
-
lovetox
its one insert, vs as much as there are reactions
-
lovetox
and of course on querying, its find one record, vs as much as there are reactions
-
lovetox
storage size wise its also worse, i cannot find a real benefit of doing it like that
-
moparisthebest
No reason to not just put it on the message row itself imho
-
singpolyma
moparisthebest: well, you might get the reaction before you get the message
-
rom1dep
> yes. two queries instead of one if you do it that way depends on how you count … do 2 queries in 1 transaction amount to 1 or to 2? https://sqlite.org/np1queryprob.html ↺
-
lovetox
moparisthebest, i think you are thinking about single chat
-
lovetox
in MUC you can have 1000 of reactions from different users
-
rom1dep
> its one insert, vs as much as there are reactions worse in which way, exactly? ↺
-
singpolyma
i did it that way at first (only on the message) but then it causes problems when you get a reaction and there is no message yet
-
singpolyma
> in MUC you can have 1000 of reactions from different users I don't think this changes anything though?
-
lovetox
of course, you need to store from which user is which reaction
-
lovetox
how do you do this in a sensible way in "one message row"
-
lovetox
put everything in a big json array, and store it in the column "reactions" :D
-
rom1dep
it depends on if/how you relate message_id to message_emitter_id
-
Zash
replace sqlite with a big json file!
-
singpolyma
yes. I mean, a map not an array, but yes
-
moparisthebest
> replace sqlite with a big json file! Mongodb ??? ↺
-
singpolyma
I'd use jsonb these days, but if you need to support old sqlite then json
-
lovetox
singpolyma, ok, im not saying this does not "work", its just i like to use a relational database and actually use relations
-
singpolyma
says the person considering a comma seperated list ;)
-
singpolyma
I like to use relations where they add value. There's no reason to refuse to use features the database system has just because it also has other features
-
lovetox
yeah ... thats the thing, somewhere even i draw a line :D
-
singpolyma
sqlite supports storing compound values in a column, so when it makes sense I'll happily do it
-
moparisthebest
So you are saying table reactions (user, message_id, list_o_reactions) ? Seems sensible to me /shrug
-
moparisthebest
unique index over (user, message_id)
-
rom1dep
that is, if you don't already have a table (discussion_id, message_id) somewhere ; then you don't need user_id in your triple?
-
singpolyma
you need sender_id somewhere so you know who sent which reactions
-
moparisthebest
user is sender_id however you represent that
-
rom1dep
in the table messages = (message_id, message_sender) probably✎ -
rom1dep
in the table messages = (message_id, message_sender_id) probably ✏
-
lovetox
as it seems you all want to thing this through, here my current table ``` Table Reaction: pk fk_account_pk fk_remote_pk fk_occupant_pk id direction emojis timestamp ```✎ -
lovetox
as it seems you all want to think this through, here my current table ``` Table Reaction: pk fk_account_pk fk_remote_pk fk_occupant_pk id direction emojis timestamp ``` ✏
-
moparisthebest
what's direction ?
-
lovetox
incoming or outgoing, basically the field that tells me if its a reaction sent by me
-
rom1dep
what do you lose with having `emoji` and possibly several entries instead of `emojis` and having to store an array?
-
lovetox
i think i listed all the things above, more inserts, more data on select, more storage size
-
rom1dep
`group_concat(emoji)`
-
rom1dep
insertions: are more atomic (more, but less cycles each) select: identical; your select returns the same string with group_concat storage: I would say, less in practice
-
lovetox
!? it cant be less, you save for each value 7 columns which are not the value
-
lovetox
insertions are more atomic? is this supposed to be something good? it means more I/O, means your application is slower
-
rom1dep
it depends on the data type of `emojis`
-
lovetox
the data you select could be identical, but it needs the database more time to do
-
rom1dep
more atomic = close to "append only", you just insert, it's fast. The alternative is query, compare, update
-
lovetox
as we previsouly said, you cannot "only append", users can also remove values
-
singpolyma
rom1dep: there's no reason to compare
-
lovetox
and before you say it, they dont tell you what they remove, you have to find out yourself
-
singpolyma
it's either delete+update or upsert with the compound emojis column
-
lovetox
which means either you insert 50 emojis, then on the next update you delete and reinsert 49 emojis
-
lovetox
or you query everything, build a diff and do that then
-
singpolyma
I can't imagine a case where buiding a diff would be better than just replacing
-
rom1dep
> and before you say it, they dont tell you what they remove, you have to find out yourself I guess that's the part where it falls apart, why can't we know which one got removed? ↺
-
lovetox
because it would make the protocol way more error prone, and complicated for clients
-
lovetox
and i guess the goal was, make this easy, so the idea is, the last value you receive is the one that counts, and everything else is thrown away
-
lovetox
which .. leads us to one row which we update :D
-
Zash
so the complexity got moved to the client instead of the protocol? :)
-
rom1dep
can't the protocol be "I, client X, remove reaction Y for message Z" , and so it'd be a matter of doing a delete `where emoji=Y and messageid=Z` ?
-
lovetox
rom1dep, of course it could, are you saying transfer protocols should be build on basis of how a specific database works?
-
rom1dep
> so the complexity got moved to the client instead of the protocol? :) you mean, we can't just make the complexity disappear? :) ↺
-
Zash
`DELETE WHERE "emoji" NOT IN (new reactions)` + `INSERT` each ignoring failures
-
Zash
or delete all + insert new ones
-
rom1dep
> rom1dep, of course it could, are you saying transfer protocols should be build on basis of how a specific database works? I'm not saying that? ↺
-
rom1dep
I don't know the protocol, it could be/have been designed either way
-
lovetox
could the protocol not be "xxx" so i can do "databasestuff" sure sounds like you are suggesting people should think about SQL when designing the protocol
-
rom1dep
Again, I'm not making this argument. I don't think it matters very much
-
rom1dep
All I'm saying is that you are trying to store a 1:M (message_id:reactions) relationship, that generally goes with repeating the message_id for all the relations
-
lovetox
but in practice you cannot just store a message id, message id is a non unique id, you need store a lot more per reaction, see my table
-
lovetox
but anyway, thanks for all the discussion
-
rom1dep
lovetox: couldn't account, remote, occupant, direction(, timestamp?) be stored under a single message_id in a messages table? (not saying you should, that's my impression of what DB/normalized form zealots could say)
-
lovetox
many clients do it that way, but it has one serious drawback, it assumes you have a message before you have a reaction
-
singpolyma
Yes. This is always the big problem
-
singpolyma
Getting reactions before messages they react to
-
lovetox
this is good aslong as you request history forward, but it falls apart if you want to request it backwards
-
lovetox
which is one feature that many Gajim users ask for, so i designed the database layout to make it work
-
lovetox
also your approach only works for single chat, you forget that its a 1:N relation in groupchat, so stuff like occupant and timestamp cannot be part of the 1 side
-
lovetox
i think when all the XEPs where introduced that reference other messages, this was a turning point. you can simply not make an application work that was designed without that in mind. its a whole different game, then it was before, where you simply received messages, forwards, backwards, it did not matter as long as it had a timestamp everything was fine✎ -
lovetox
i think when all the XEPs where introduced that reference other messages, this was a turning point. you can simply not make an application work that was designed without that in mind. its a whole different game, than it was before, where you simply received messages, forwards, backwards, it did not matter as long as it had a timestamp everything was fine ✏
-
lovetox
now you can receive things that reference other things, you dont have.
-
flow
it's possible to receive a reaction before the actual message?
-
lovetox
of course, join a groupchat, and query 1 day of history
-
lovetox
then you will get reactions, that reference messages 10 days agi✎ -
lovetox
then you will get reactions, that reference messages 10 days ao ✏
-
lovetox
then you will get reactions, that reference messages 10 days ago ✏
-
flow
ahh if you go back in history
-
lovetox
even without requesting history, join a groupchat - dont request history, first message you receive can be a reaction to a message you dont have
-
lovetox
i guess its all ok as long as you dont offer your users the feature to request past history of a MUC
-
flow
sure, I was more thinking about situations where you recieve (live) the reaction first and then the message