jdev - 2024-05-09


  1. debacle

    Ad break: XMPP Sprint Berlin 2024-07-12..14 in the wonderful venue of Wikimedia Germany. Please add yourself, if the probability of your attendance is at least 1 %: https://wiki.xmpp.org/web/Sprints/2024-07_Berlin#Attendees Thank you!

  2. lovetox

    to developers who implemented reactions, how do you store the reaction values in the database? In the case where a user attaches multiple reactions. I thought about simply join them to a string with a seperation character like ";" or something

  3. lovetox

    to developers who implemented reactions, how do you store the reaction values in the database? In the case where a user attaches multiple reactions. I thought about simply join them to a string with a separation character like ";" or something

  4. Zash

    that sounds like a violation of one of those database design principles

  5. moparisthebest

    The ones you learn in database theory classes that no one uses in practice? :)

  6. singpolyma

    lovetox: i store exactly the same as a reply. Honestly i consider them to be the same thing with different ui

  7. lovetox

    singpolyma, not sure what you mean, completely different data structure, one is a simple id, the other is a list of string values

  8. singpolyma

    Oh I see, so you're saying how to format the row if you don't store the whole stanza. If you're in sqlite probably as a jsonb array?

  9. larma

    lovetox, we do a comma seperated list, it's probably not perfect db design, but makes things much easier (just have to replace that single row when there is an update, not do a diff to figure out which rows to delete and which ones to add)

  10. flow starts to write code to produce <reaction>,</reaction>…

  11. lovetox

    larma, yes that was also my thinking, of course i could do a table with reaction values, but makes updating it cumbersome, for no gain

  12. larma

    flow, probably will end up in empty string reactions

  13. Zash reacts with zero-width-non-breaking-space

  14. flow

    larma, I would hope that implementations reject reactions where the reaction is the chosen character to delimit reactions

  15. lovetox

    flow not sure what you expect to happen, you load a string from the database, call split() on it and get an array of strings

  16. lovetox

    if there is a empty one in there i doubt the application goes down

  17. lovetox

    but yeah if the developer didnt think of this possibility and it creats some exception, you can annoy someone

  18. lovetox

    as emojis is an expanding standard, developers who implement reactions already need to consider that they receive emojis which they dont know, or cant display because whatever gui component does not yet support it.

  19. larma

    flow, well, technically <reaction>,</reaction> is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set will be different, clients likely won't have a blacklist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine

  20. larma

    flow, well, technically <reaction>,</reaction> is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set will be different, clients likely won't have a blacklist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine IMO

  21. larma

    flow, well, technically `<reaction>,</reaction>` is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set will be different, clients likely won't have a blacklist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine IMO

  22. larma

    flow, well, technically `<reaction>,</reaction>` is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set of valid chars will change all the time, clients likely won't have a blacklist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine IMO

  23. larma

    flow, well, technically `<reaction>,</reaction>` is invalid, but this is true for a lot of chars, because XEP says it must be a valid emoji according to Unicode. But because the set of valid chars will change all the time, clients likely won't have a allow/denylist and just behave weirdly if you use non-emojis, understanding a `,` the same as `` is fine IMO

  24. larma

    Yes, we could probably have a few strings on a denylist which we seem unreasonable to be used, and that would probably include `,` and `` and all non-printable ASCII chars, but then probably wouldn't include the zwnbsp or any other weird whitespace chars, so it will be incomplete nonetheless

  25. lovetox

    .. but if you dont have a complete blacklist, you can probably spare the whole blacklist

  26. lovetox

    the other way around is also problematic, if you check if something is an emoji, this will always depend on your libraries and how current they are

  27. singpolyma

    Why not Json/jsonb instead of invented encoding?

  28. pulkomandy

    Or ascii record separators? These are certainly not allowed as reactions?

  29. lovetox

    no ascii char is allowed ..

  30. lovetox

    its not a problem of finding a character, there are thousands you can choose from

  31. lovetox

    of course i can dump it also as json, but this does not prevent the problem of people sending stuff thats not an emoji

  32. lovetox

    of course i can dump it also as json, but this does not prevent the "problem" of people sending stuff thats not an emoji

  33. flow

    fwiw, what pulkomandy suggests would work

  34. flow

    IIRC ascii control chars are not available in XML 1.0, so using them is possible (jxmpp-strings-testframework does exactly that)

  35. moparisthebest

    \0 is not allowed and makes C devs angry as a bonus

  36. moparisthebest

    Though... Is each reaction a single utf8 character? Because then you don't need a delimiter at all

  37. Zash

    It's a single grapheme cluster or whatever the term is

  38. Zash

    Emojis tend to be more than one code point

  39. pulkomandy

    Yes, and the definition of what a grapheme cluster is depends on the unicode version as more things may become allowed. So you can't really rely on it, unless the xep is updated to specify which version of unicode to use. And even then, I'm not sure there is a strict definition of "this is an emoji" in unicode?

  40. lovetox

    there is

  41. lovetox

    but as said, it does not matter, nothing bad happens if you show a user reacted with the char "A"

  42. lovetox

    some users will think, hm weird thats not an emoji, anyway ...

  43. Zash

    `<reaction>lol</reaction>`

  44. lovetox

    others will probably build a whole workflow on that, and then demand other clients to show also non-emojis

  45. lovetox

    > `<reaction>lol</reaction>` definitly can see this as a feature request, "Let us react with small abbreviations"

  46. singpolyma

    > of course i can dump it also as json, but this does not prevent the "problem" of people sending stuff thats not an emoji No, but it prevents the problem of them reacting with something that breaks your encoding. other benefit of json or jsonb in sqlite is you can use it in queries easily

  47. singpolyma

    `<reaction>https://cdn.blarghl.com/gifs/sofunny.gif</reaction>` 😉

  48. lovetox

    singpolyma, they cannot break the encoding, they can send reaction values which lead to strings that are not emojis.

  49. lovetox

    but this only affects their own reaction, so other users will not see or understand the users reaction

  50. lovetox

    it hurts only the user itself who did this

  51. rom1dep

    > to developers who implemented reactions lovetox: how about a "reactions" table with 2 columns, (messageID, reaction_content)? I imagine it makes inserting/deleting/querying easy

  52. lovetox

    rom1dep, yes thats obvious, the question was how to store reaction_content, because its a list of strings

  53. singpolyma

    I'd use the built in facilities to store lists, but probably anything will work as you say

  54. rom1dep

    lovetox: why would you limit yourself to having a unique constraint on MessageID? Then reaction_content isn't a list any longer

  55. lovetox

    because users can add and remove reactions, if you are not having one row per user reaction, updating the state is more complicated

  56. lovetox

    and needs multiple queries

  57. rom1dep

    As long as they don't add/remove more than one reaction at a time, I don't see how it's not easier

  58. singpolyma

    They can though

  59. lovetox

    but they can add / remove as much as they want

  60. singpolyma

    Remove 6 and add 50 with one message

  61. Zash

    delete all for that sender/message-id and add the ones in the new stanza?

  62. lovetox

    and the removing is implicit through absence

  63. lovetox

    so its not like you can issue a delete statement for specific reactions

  64. lovetox

    you need to find out the diff

  65. rom1dep

    Even then, it's a "delete where message_id = X and reaction in (to_del1, to_del2)"

  66. singpolyma

    yes. two queries instead of one if you do it that way

  67. lovetox

    the update is one thing, but your approach would be also much worse on inserting

  68. lovetox

    its one insert, vs as much as there are reactions

  69. lovetox

    and of course on querying, its find one record, vs as much as there are reactions

  70. lovetox

    storage size wise its also worse, i cannot find a real benefit of doing it like that

  71. moparisthebest

    No reason to not just put it on the message row itself imho

  72. singpolyma

    moparisthebest: well, you might get the reaction before you get the message

  73. rom1dep

    > yes. two queries instead of one if you do it that way depends on how you count … do 2 queries in 1 transaction amount to 1 or to 2? https://sqlite.org/np1queryprob.html

  74. lovetox

    moparisthebest, i think you are thinking about single chat

  75. lovetox

    in MUC you can have 1000 of reactions from different users

  76. rom1dep

    > its one insert, vs as much as there are reactions worse in which way, exactly?

  77. singpolyma

    i did it that way at first (only on the message) but then it causes problems when you get a reaction and there is no message yet

  78. singpolyma

    > in MUC you can have 1000 of reactions from different users I don't think this changes anything though?

  79. lovetox

    of course, you need to store from which user is which reaction

  80. lovetox

    how do you do this in a sensible way in "one message row"

  81. lovetox

    put everything in a big json array, and store it in the column "reactions" :D

  82. rom1dep

    it depends on if/how you relate message_id to message_emitter_id

  83. Zash

    replace sqlite with a big json file!

  84. singpolyma

    yes. I mean, a map not an array, but yes

  85. moparisthebest

    > replace sqlite with a big json file! Mongodb ???

  86. singpolyma

    I'd use jsonb these days, but if you need to support old sqlite then json

  87. lovetox

    singpolyma, ok, im not saying this does not "work", its just i like to use a relational database and actually use relations

  88. singpolyma

    says the person considering a comma seperated list ;)

  89. singpolyma

    I like to use relations where they add value. There's no reason to refuse to use features the database system has just because it also has other features

  90. lovetox

    yeah ... thats the thing, somewhere even i draw a line :D

  91. singpolyma

    sqlite supports storing compound values in a column, so when it makes sense I'll happily do it

  92. moparisthebest

    So you are saying table reactions (user, message_id, list_o_reactions) ? Seems sensible to me /shrug

  93. moparisthebest

    unique index over (user, message_id)

  94. rom1dep

    that is, if you don't already have a table (discussion_id, message_id) somewhere ; then you don't need user_id in your triple?

  95. singpolyma

    you need sender_id somewhere so you know who sent which reactions

  96. moparisthebest

    user is sender_id however you represent that

  97. rom1dep

    in the table messages = (message_id, message_sender) probably

  98. rom1dep

    in the table messages = (message_id, message_sender_id) probably

  99. lovetox

    as it seems you all want to thing this through, here my current table ``` Table Reaction: pk fk_account_pk fk_remote_pk fk_occupant_pk id direction emojis timestamp ```

  100. lovetox

    as it seems you all want to think this through, here my current table ``` Table Reaction: pk fk_account_pk fk_remote_pk fk_occupant_pk id direction emojis timestamp ```

  101. moparisthebest

    what's direction ?

  102. lovetox

    incoming or outgoing, basically the field that tells me if its a reaction sent by me

  103. rom1dep

    what do you lose with having `emoji` and possibly several entries instead of `emojis` and having to store an array?

  104. lovetox

    i think i listed all the things above, more inserts, more data on select, more storage size

  105. rom1dep

    `group_concat(emoji)`

  106. rom1dep

    insertions: are more atomic (more, but less cycles each) select: identical; your select returns the same string with group_concat storage: I would say, less in practice

  107. lovetox

    !? it cant be less, you save for each value 7 columns which are not the value

  108. lovetox

    insertions are more atomic? is this supposed to be something good? it means more I/O, means your application is slower

  109. rom1dep

    it depends on the data type of `emojis`

  110. lovetox

    the data you select could be identical, but it needs the database more time to do

  111. rom1dep

    more atomic = close to "append only", you just insert, it's fast. The alternative is query, compare, update

  112. lovetox

    as we previsouly said, you cannot "only append", users can also remove values

  113. singpolyma

    rom1dep: there's no reason to compare

  114. lovetox

    and before you say it, they dont tell you what they remove, you have to find out yourself

  115. singpolyma

    it's either delete+update or upsert with the compound emojis column

  116. lovetox

    which means either you insert 50 emojis, then on the next update you delete and reinsert 49 emojis

  117. lovetox

    or you query everything, build a diff and do that then

  118. singpolyma

    I can't imagine a case where buiding a diff would be better than just replacing

  119. rom1dep

    > and before you say it, they dont tell you what they remove, you have to find out yourself I guess that's the part where it falls apart, why can't we know which one got removed?

  120. lovetox

    because it would make the protocol way more error prone, and complicated for clients

  121. lovetox

    and i guess the goal was, make this easy, so the idea is, the last value you receive is the one that counts, and everything else is thrown away

  122. lovetox

    which .. leads us to one row which we update :D

  123. Zash

    so the complexity got moved to the client instead of the protocol? :)

  124. rom1dep

    can't the protocol be "I, client X, remove reaction Y for message Z" , and so it'd be a matter of doing a delete `where emoji=Y and messageid=Z` ?

  125. lovetox

    rom1dep, of course it could, are you saying transfer protocols should be build on basis of how a specific database works?

  126. rom1dep

    > so the complexity got moved to the client instead of the protocol? :) you mean, we can't just make the complexity disappear? :)

  127. Zash

    `DELETE WHERE "emoji" NOT IN (new reactions)` + `INSERT` each ignoring failures

  128. Zash

    or delete all + insert new ones

  129. rom1dep

    > rom1dep, of course it could, are you saying transfer protocols should be build on basis of how a specific database works? I'm not saying that?

  130. rom1dep

    I don't know the protocol, it could be/have been designed either way

  131. lovetox

    could the protocol not be "xxx" so i can do "databasestuff" sure sounds like you are suggesting people should think about SQL when designing the protocol

  132. rom1dep

    Again, I'm not making this argument. I don't think it matters very much

  133. rom1dep

    All I'm saying is that you are trying to store a 1:M (message_id:reactions) relationship, that generally goes with repeating the message_id for all the relations

  134. lovetox

    but in practice you cannot just store a message id, message id is a non unique id, you need store a lot more per reaction, see my table

  135. lovetox

    but anyway, thanks for all the discussion

  136. rom1dep

    lovetox: couldn't account, remote, occupant, direction(, timestamp?) be stored under a single message_id in a messages table? (not saying you should, that's my impression of what DB/normalized form zealots could say)

  137. lovetox

    many clients do it that way, but it has one serious drawback, it assumes you have a message before you have a reaction

  138. singpolyma

    Yes. This is always the big problem

  139. singpolyma

    Getting reactions before messages they react to

  140. lovetox

    this is good aslong as you request history forward, but it falls apart if you want to request it backwards

  141. lovetox

    which is one feature that many Gajim users ask for, so i designed the database layout to make it work

  142. lovetox

    also your approach only works for single chat, you forget that its a 1:N relation in groupchat, so stuff like occupant and timestamp cannot be part of the 1 side

  143. lovetox

    i think when all the XEPs where introduced that reference other messages, this was a turning point. you can simply not make an application work that was designed without that in mind. its a whole different game, then it was before, where you simply received messages, forwards, backwards, it did not matter as long as it had a timestamp everything was fine

  144. lovetox

    i think when all the XEPs where introduced that reference other messages, this was a turning point. you can simply not make an application work that was designed without that in mind. its a whole different game, than it was before, where you simply received messages, forwards, backwards, it did not matter as long as it had a timestamp everything was fine

  145. lovetox

    now you can receive things that reference other things, you dont have.

  146. flow

    it's possible to receive a reaction before the actual message?

  147. lovetox

    of course, join a groupchat, and query 1 day of history

  148. lovetox

    then you will get reactions, that reference messages 10 days agi

  149. lovetox

    then you will get reactions, that reference messages 10 days ao

  150. lovetox

    then you will get reactions, that reference messages 10 days ago

  151. flow

    ahh if you go back in history

  152. lovetox

    even without requesting history, join a groupchat - dont request history, first message you receive can be a reaction to a message you dont have

  153. lovetox

    i guess its all ok as long as you dont offer your users the feature to request past history of a MUC

  154. flow

    sure, I was more thinking about situations where you recieve (live) the reaction first and then the message