XSF Discussion - 2020-01-22

  1. pep.

    Is there a way to do with pubsub (or else?) many publishers many subscribers, but only subscribers see everything. publishers see their own items

  2. dwd

    pep., Defining fulltext search fully would mean servers would have to implement a full-text search engine entirely - it wouldn't handle, for example, stemming in a homogeneous manner, so we'd presumably have to ban that, which feels undesirable. AIUI, MattJ's suggestion is a strict substring field as well as a "magic" field. I think the threat of beer-buying is sufficient to prevent outright silliness (it also prevents anyone being silly and still claiming full conformance, BTW).

  3. MattJ

    dwd, I don't think we have to rule out stemming

  4. MattJ

    nor mandate it

  5. MattJ

    (for the "plain" search)

  6. MattJ

    But most FTS engines provide an advanced query language, and that's mainly what I want to avoid exposing

  7. MattJ

    But e.g. it sounded like that's what Guus was doing, it's usually the (slightly) easier option

  8. dwd

    Right, indeed. Your suggestion is a dumb substring search, plus magic. I'd aim for magic first, going that any query language is close to nothing. I'm thinking in terms of tsvector in pgsql, for example.

  9. dwd

    Unless I misunderstand your suggestion here.

  10. MattJ

    I'm just saying there should be two fields, plain and implementation-specific

  11. MattJ

    running with the postgres example, the plain one would use plainto_tsquery() for example

  12. dwd

    But then your plain one would do stemming for example. Surely?

  13. MattJ


  14. MattJ

    I don't see that as a problem

  15. MattJ

    It defines the semantics of the user input, not what the implementation does with that info

  16. MattJ

    "This is a query from the user with no special operators or syntax"

  17. MattJ

    Now find some messages

  18. MattJ

    Which is different to: > <Guus>  simple keywords will work, but more elaborate lucene queries too (although you'd need to know the index fields)

  19. MattJ

    I'm saying we should have a way to expose the elaborate queries (if that's what deployments/implementations really want), but we should also have a safe option

  20. MattJ

    Safe in the sense that you can just throw some text in there and have a reasonable expectation it will return something useful

  21. MattJ

    This is from last year, but I just remembered it: https://opensourceconnections.com/blog/2019/05/29/falsehoods-programmers-believe-about-search/

  22. MattJ

    > When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon

  23. MattJ

    Though I think my favourite from there is: > A customer using the same query twice expects the same results for both searches

  24. dwd

    Ah, I see.

  25. ralphm

    Yeah, search is not trivial.

  26. dwd

    So if I understand correctly, MattJ is arguing that the second sentence of 3.2 of my protoxep should be in effect reversed, and servers MUST interpret any words or characters as search terms, and not treat them as directives or operators.

  27. dwd

    I can certainly go that route.

  28. MattJ

    Depends. You only specified one field, and it depends whether you specified the plain or the non-plain one :)

  29. MattJ

    Guus wants the non-plain one, and this draft was primarily for him at this point, right? :)

  30. Kev

    We already implement MAM search, FWIW, and have for years.

  31. MattJ

    But I wanted to add in a way for the server to convey some help text for the non-plain one

  32. MattJ

    Also we need to deal with localization in the various parts of this

  33. dwd


  34. MattJ

    and that's not easy - it's extremely likely that the server is only going to have FTS for a single locale

  35. MattJ

    But multiple is of course possible with the right setup, I just don't see many people crazy enough to dedicate the resources to that

  36. dwd

    Yeah, fine with extending my extension to introduce extensions. I was mostly in my XEP project for inbox and figured I'd knock it this once so as to have something.

  37. MattJ

    I can contribute the missing parts, I've had this in my head for quite a while, but I'm way too busy this side of FOSDEM

  38. dwd

    I'm not wed to anything here except the beers.

  39. MattJ

    "OR an orange juice"?

  40. dwd

    You don't have to drink the beer. They just have to buy it.

  41. Ge0rG

    this reminds me of how the formulas in Excel/LibreCalc are language-dependent, so if you work with multiple locales, you always get it wrong

  42. dwd

    It makes it impossible to claim they meet the standard by syntax alone.

  43. Guus

    Kev what fields do you use, and what functionality is behind it? My primary motivation was to re-use existing field names if possible, to have overlap.

  44. Kev

    You expect me to remember things? :o

  45. Kev

    { "search", sizeof("search")-1, XEP313_FILTER_TYPE_STRING, "If specified, only return messages that contain each of these words, in any order" }, Seems to be what we're using.

  46. MattJ

    I think that's a sensible implementation of the "plain" variant

  47. Daniel

    that's what Conversations' local search does as well; and what I would expect a server side search to do as well

  48. jonas’

    define "word"

  49. jonas’

    if I search for arc, will it return messages which contain "search"?

  50. jonas’

    will it return messages which contain "The word 'arc' may be contained or not"?

  51. jonas’


  52. dwd

    jonas’, Does it matter?

  53. jonas’

    probably not

  54. dwd

    jonas’, I mean, does anyone care about replicability of search results between servers? (or between the client's local archive and the server?)

  55. dwd

    Although there's an argument that it might be useful to have a MAM switch that emits only the ids and not the entire messages in case you already have the data. But that feels like an optimisation for another day.

  56. Zash

    If you already have the data, can't you just search there directly?

  57. jonas’

    Zash, look at how slow conversations is since it got a FTS index ;)

  58. dwd

    Zash, Well, depends on how much of the data you have.

  59. Zash

    Also there's the thing where, given an id, it's tricky to retrieve that message

  60. dwd

    Zash, It is?

  61. Zash

    You can request messages before, or after, but not a specific message by id.

  62. dwd

    Zash, Oh. Well, that's stupid then.

  63. dwd

    Zash, So you'd have to ask for one after, then one before the result you get.

  64. Zash


  65. dwd

    Yeah, that's daft.

  66. Zash

    Or one before, then the one after that.

  67. Zash

    Either may or may not work depending on how many messages you have

  68. Zash

    Inb4 inventing SQL over XMPP to solve this

  69. dwd

    Zash, Ask for one before and one after concurrently.

  70. dwd

    Zash, Then it's only 2RTT to get the message you actually wanted.

  71. Guus

    Unrelated question: with RSM, the direction in which you page through the resultset doesn't affect what's defined as the 'first' and 'last' element, right?

  72. Guus

    iow the order of elements on a page does not differ based on the direction that you page through the result set?

  73. Zash


  74. Guus

    👍 thanks

  75. Zash

    Which makes it funky to ORDER BY backwards, get the results from SQL backwards, then flip them and send them to the client.

  76. flow

    Zash, Guus, I'd love to see this written down in xep313 (if it's not already).

  77. Zash

    Doesn't it say in RSM?

  78. flow

    ahh, if so, then i guess that is fine too

  79. Zash

    I suppose it doesn't hurt adding some implementation note about it. Feel free to PR 😉

  80. dwd


  81. pep.

    > dwd> You don't have to drink the beer. They just have to buy it. > It makes it impossible to claim they meet the standard by syntax alone. I claim encumbrance. You don't know how easy it is for them to obtain beer :p

  82. jonas’

    I claim encumbrance. I reject supporting the beer production.

  83. pep.

    dwd: interoperability still mandates common wire format doesn't it

  84. pep.

    re MLS/wire

  85. dwd

    pep., Ah, yes. I thought it interesting primarily because Wire were pushing MLS as primary marketing. It's more or less finished, but it's got all the heavyweight cryptanalysis to go - roughly at the same stage where some early experimental deployment of TLSv1.3 was happening while till in Draft, about a year or so befroe the RFC was published.

  86. flow

    pep., MLS-interoperability across federated messaging protocols? I'd expect that to require even more than just a common wire format

  87. Zash

    A common data model at least, so you can map into whatever format

  88. dwd

    flow, You could do text message bridging, though. Depends what the goals are.

  89. jonas’

    depends on where you draw the line around "wire" in "wire format"

  90. jonas’

    or, what Zash says

  91. flow

    I wouldn't be suprised if MLS needs to be tightly-coupled with the underlying groupchat mechanism

  92. dwd

    flow, Prepare to be surprised, then.

  93. flow

    I am prepared, can I be suprised now?

  94. dwd

    flow, In principle, if two members of the group attempt to commit at once it could get weird, and the DS is supposed to impose a strict ordering, but XMPP does that anyway so I don't think anything special would be needed.

  95. flow

    dwd, DS?

  96. dwd

    flow, Also, "Commit?". Easiest to skim the architecture drafts and get a feel for it.

  97. flow

    will do

  98. dwd

    I can probbaly knock together a lightingish talk at the Summit on MLS if there's interest. Not that I'm any kind of cryptographer of course.

  99. jonas’

    I’d be interested

  100. jonas’

    reminds me to put me on the list of remote attendants

  101. jonas’

    and reminds me to allocate a day off

  102. Kev

    I think Andrew's right, we should use what's already in the most popular XMPP server (although it's 2014 it was added, not 2016) and use MAM search the way M-Link does :)

  103. Zash

    Popularity contest? I object!

  104. dwd

    Kev, No idea what you're on about, Openfire's only just adding the feature.

  105. Guus

    I was contemplating how to put that to words, dwd .

  106. Zash

    Excuse me, that's the weirdest spelling of Prosody I've seen yet

  107. Zash


  108. jonas’


  109. Guus

    to be honest, I have no clue how many instances of Openfire are running

  110. Guus

    We have download stats, and update check stats, which give some indication, but that's about it.

  111. dwd

    Lots in locked-down enterprise networks connecting to Active Directory, though.

  112. jonas’

    in the federated world, not many, I think

  113. Guus

    probably true

  114. dwd

    jonas’, Still plenty there; I think most of those doing update checks are likely to be federated.

  115. jonas’

    at least not many seen by s.j.n

  116. jonas’

    so not many hosting MUC services

  117. moparisthebest

    Daniel, larma, lovetox, any thoughts on a swap over to finally sending 12-byte IVs ? context: https://github.com/siacs/Conversations/issues/2578

  118. MattJ

    Relevant: https://github.com/siacs/Conversations/commit/e38a9cd729bfa44d06beb44859516a1eebbb3c92

  119. MattJ

    (and https://github.com/siacs/Conversations/commit/9af056bb16d7294e427dce2d92944c4d12bd8d0f )

  120. Daniel

    it will probbaly happen with the next minor release (not bugfix)

  121. Daniel

    Siskin and profanity are 'fixed' in master

  122. Daniel

    and we will wait for them to release

  123. moparisthebest

    aw awesome, going to go ahead and comment on that issue

  124. Wojtek

    BeagleIM as well (same library as Siskin), should be released soon-ish (depends a bit on Apple)

  125. jonas’

    cc @ Syndace

  126. Syndace

    thanks jonas’, was involved in that decision

  127. jonas’


  128. Daniel

    what's the implementation status of bookmarks 2?

  129. pep.

    After what's been done in the sprint?

  130. Daniel

    yeah probably not much

  131. pep.

    the prosody module should be working now

  132. pep.

    converts between all 3 iirc

  133. Link Mauve

    Converts from both forms of XEP-0048 to XEP-0402 format, and then lets the old form of XEP-0048 read from the same store.

  134. Link Mauve

    The PEP form of XEP-0048 is only considered for migration, after which it is left unusable.

  135. Link Mauve

    This should work fine since clients can’t rely on this PEP form working when XEP-0411 isn’t advertised.

  136. Daniel

    Yes I actually think that's fine

  137. Daniel

    I know I was super eager on having migration between old pep and new pep working as well. But I don't really understand why anymore

  138. Link Mauve

    It is now working anyway. :)

  139. Link Mauve

    Migration, not concurrent usage.

  140. Daniel

    Yeah. I meant concurrent usage. But yeah it should be fine.

  141. Daniel

    You can unload the old module and then load the new and everything should be ok

  142. Link Mauve


  143. Link Mauve

    The new module will refuse to get loaded if the first one is in the configuration file.

  144. Link Mauve

    (Or loaded.)

  145. Daniel

    Yeah that's cool. Yeah I would like to see a last call on that. Get some more feedback from a wider community and then deploy it.

  146. Daniel

    So for once we could actually do it properly and have a LC before deployment

  147. pep.

    What about the extensions proposal from Link Mauve btw? did that progress a bit? Maybe awaiting for a PR?

  148. Daniel

    The what now?

  149. Daniel

    The changes to the xep went through

  150. pep.

    let me grep in the list

  151. Link Mauve

    pep., which extensions proposal?

  152. pep.

    yours, to bookmarks2

  153. pep.

    For stuff like password etc., or else

  154. Link Mauve

    Ah, the have clients not throw away extensions?

  155. pep.


  156. Link Mauve

    dwd said he was going to add that to the spec.

  157. Link Mauve


  158. pep.


  159. Daniel

    I would actually be cool if we could make Draft before Berlin

  160. Link Mauve


  161. Daniel

    Then we can put the final touches on the implementations in Berlin

  162. eevvoor

    at the sprint you mean Daniel?

  163. Daniel

    eevvoor: yes

  164. Ge0rG

    dwd: is Inbox a sophisticated attempt at testing how many levels deep you can nest a <message> without getting your computer taken away? ;)

  165. dwd

    That's a cruel and accurate suggestion.

  166. dwd

    Really, it's a matter of trying to reuse the result from MAM such that things like MAMFC plug into it neatly.

  167. dwd

    But it did feel a bit nesty. Might be a better way of constructing it by injecting an inbox bit inside the result, perhaps.

  168. Ge0rG

    maybe I'm just fed up with trying to read nested messages from one-liner XML dumps from my client and server logs

  169. Ge0rG

    dwd: I don't have a good idea ATM

  170. Kev

    xmllint --format became my friend years ago, and has remained so since.

  171. Kev

    Because yes, reading one-line XMPP stanzas gets worse the deeper they go.

  172. Ge0rG

    Kev: I suppose I need to add a key binding for it to my vim

  173. Ge0rG

    Kev: it's double nasty in clients that just dump the raw stream instead of individual stanzas, so that your grep dumps a screenful of XML and you need to find the beginning and end of things to be able to xmllint

  174. pep.

    I'm not sure I understand why <entry> contains the latest message

  175. pep.

    I mean the whole message

  176. Ge0rG

    pep.: so that you can show the last message in your chat list

  177. pep.

    Are you not going to do MAM anyway right after?

  178. Kev

    No reason to.

  179. pep.

    To get more than 1 message yes

  180. Ge0rG

    pep.: you could implement a thin client that only MAMs when you open a tab

  181. Kev


  182. pep.

    Ge0rG, sure, and then I just need to do MAM when you open the tab

  183. pep.

    Because I will do MAM

  184. pep.

    What I'm interested in inbox is really just the list, because then I know what to fetch via MAM

  185. Kev

    It's fairly common when rendering an inbox (both in chat clients and elsewhere) to want to show a preview of the most recent message, so including the most recent message would achieve that (without doing 100/200/howevermany individual MAM queriest to get the latest message for each inbox entry).

  186. Kev

    So it seems useful to me.

  187. pep.

    yeah maybe.. probably something I'll have to ignore then

  188. Ge0rG

    pep.: yes.

  189. Ge0rG

    I still think that poezio should be a fat client, though ;)

  190. pep.

    Ge0rG, in any case that message is useless to me in poezio

  191. pep.

    I'll do MAM to sync up with the last known id

  192. Kev

    The whole fat client/thin client thing I think is only going to be 'resolved' by allowing for both.

  193. Ge0rG

    Kev: I agre

  194. Ge0rG

    Kev: I agree

  195. Kev

    In cases where allowing for both is going to mean lots of data being sent that one or the other doesn't want, potentially shoving a bool on a query to exclude the noise might make sense.

  196. Ge0rG

    I actually have a use-case for both. I want a "fat" poezio on my colo server, with full local logging, and a "thin" MAM-backed one on my laptop when I'm on the go

  197. Zash


  198. Kev

    I don't know if that would add any value to inbox or not, but it's a possibility in general.

  199. Zash

    dwd, had you seen ↑ ?

  200. Kev

    Zash: Is that also similar to the unread stuff in bind2?

  201. pep.

    Ge0rG, both can use MAM

  202. Ge0rG

    pep.: sure, but in different ways

  203. Zash

    Kev, yes, it's inspired by that example in bind2

  204. Ge0rG

    pep.: I want my fat client to do a full MAM sync on startup, and then no more MAM

  205. Kev

    Where inbox is also related to the unread stuff in bind2 (but none of them quite the same)

  206. pep.

    Ge0rG, when joining a new channel

  207. Ge0rG

    startup = new session

  208. Ge0rG

    pep.: history fetch is often good enough, but yeah, okay

  209. Kev

    Zash: I wonder if there's a race there, by not doing it during bind, but it looks useful.

  210. Kev

    Zash: Submit a protoxep?

  211. Kev

    I do think that server-side tracking of unread per-contact is practically needed, which that doesn't quite do, so it's not a whole solution, I think, but is moving in that direction.

  212. Zash

    Kev: It's mostly done like that to allow easy testing since I don't have bind2 yet.

  213. Kev

    Yeah, that one's a bit of an issue :)

  214. Ge0rG

    as is IM2?

  215. Zash


  216. Zash

    XMPP 2.0

  217. dwd

    Zash, I had seen it, but then forgotten about it.

  218. dwd

    pep., And yes, you might not always want the entire message, and instead just know there is one with a particular id. Or you might not need inbox at all if you're going to pull the entire MAM archive across anyway.

  219. dwd

    pep., But lots of existing clients like to list out the conversations, and show a previewish thing of the last message. It's why, for example, Instagram's direct message inbox works in exactly this way.

  220. dwd

    pep., We have bigger challenges because we have lots of different styles of client in the XMPP world, and need to cater for them all efficiently without precluding any. I'm not trying to claim this is a finished design suitable for all cases.

  221. Ge0rG

    but it's a very good start

  222. Ge0rG

    dwd: I think it's missing a notion of "open conversations", which is a good thing to keep around in just this place

  223. pep.

    dwd, my goal is not to pull the entire MAM history

  224. pep.

    At least not at first

  225. pep.

    My goal for the inbox thing is really just to get a list of JIDs to fetch MAM for. If I don't have that then I have to fetch then entire history to know who talked to me as there might be JIDs I don't know of (not MUCs nor roster)

  226. Ge0rG

    pep.: how is having the last message in the response harmful to that?

  227. pep.

    Ok it may seem I'm still ranting about that, I'm not

  228. Zash

    Timestamp and body of last message per contact gets you most of the data you'd need to show a list of recent conversations and can be done with simple MAM. Read status needs more tracking than what at least Prosody has

  229. pep.

    Zash, that's the thing, you might not be talking only to contacts

  230. Zash

    s/contact/"with" in MAM terms/

  231. pep.

    yeah but you need to know who, which is why I like inbox

  232. Zash

    That MAP thing did that iirc. Wanna be convinced to convert it into mod_inbox? :)

  233. Zash

    Do we need some XPath-ish MAM search thing like the other example of extended search forms?

  234. pep.

    would it be possible to make that message optional maybe?

  235. pep.

    dwd, ^

  236. dwd


  237. dwd

    Zash, XPath-based MAM search? Yuck.

  238. Ge0rG

    pep.: what's your goal with that?

  239. pep.

    Ge0rG, why are you fighting it that much? That message is not needed in there all the time :x

  240. Ge0rG

    pep.: I'm not fighting, I'm curious. Every boolean options doubles the number of states you create and have to debug

  241. pep.

    we're at the protocol level still, I think we can live with one or two more options. We're not doing client UX

  242. Ge0rG

    pep.: please tell me why that Carbon message isn't displayed on my desktop client.

  243. Ge0rG

    (yes, this is a protocol question. More than UX at least)

  244. pep.

    even if that makes things more complex I'm of the opinion that I should be able to choose. if we do one-size-fits-all nobody is going to be happy, or rather, only the golden use case is going to be happy and that's annoying for everybody else

  245. Daniel

    So you want to make it optional to request or optional to generate?

  246. Daniel

    Because making it optional to generate would be bad

  247. pep.

    how bad?

  248. pep.

    I was mostly thinking "I don't need it, the server doesn't need to send it". Whether it generates it or not (or stores it as is) it's not my problem

  249. MattJ

    What about deployments without MAM? (e.g. for privacy or resource constraint reasons)

  250. pep.

    with offline messages?

  251. pep.

    In any case if the server doesn't keep messages, then it doesn't make sense indeed to force it to return the last one

  252. MattJ

    My point is mainly that you may want to support inbox on a server that doesn't store messages. It seems to me it would be easier for client devs to deal with no message than no inbox

  253. MattJ

    Er, I think I'd be fine with "if the server/user has a MAM archive enabled, you must do this"

  254. MattJ

    Just not with making a hard dependency from Inbox to MAM for deployments that don't want that

  255. MattJ

    For something that is ultimately a convenience/optimisation feature

  256. Ge0rG

    So such a server would always return an empty list?

  257. MattJ

    Ge0rG: I think <entry/> would just be empty

  258. Ge0rG

    MattJ: what JIDs would you list?

  259. MattJ

    (which is already valid per the XEP)

  260. MattJ

    I guess the XEP doesn't really specify what JIDs are in the list

  261. MattJ

    With previous PEP-based proposals that was trivial, and the list is a list of open tabs/chats

  262. pep.

    yeah I also liked that

  263. MattJ

    Opening/closing a chat cleanly mapped to adding/removing from the list

  264. MattJ

    Now it's a bit ambiguous

  265. Ge0rG

    May be it can be resolved by adding another flag to the inbox, one that reflects an open chat and is sticky even when there are no unread messages?

  266. pep.

    MattJ, I'd want a per-client list though :x (or profiles or whatever. I guess per-client is already good)

  267. Ge0rG

    pep.: if you have it per client, you don't need to sync it to the server

  268. pep.

    Ge0rG, maybe it's not just a dumb list in PEP that I want. I also do want to know if MAM/offline stuff that somebody that's not in my roster talked to me and that I need to do MAM with it

  269. pep.

    if I don't have that information right away, I need to fetch the world and I want to avoid that

  270. Ge0rG

    pep.: do you want that for all remote JIDs or just the ones that your client hasn't heard from or the ones that are new since the last MAM fetch from any of your clients?

  271. Ge0rG

    I'm trying to determine which sets of information we need for the different use cases and how they overlap

  272. MattJ

    Why not a special PEP node plus an iq that performs a query basically equivalent to what Dave's proposal has

  273. MattJ

    For the set of JIDs currently stored

  274. MattJ

    ......plus unreads??

  275. Ge0rG

    Why not have both in the same IQ?

  276. MattJ

    Both what?

  277. Ge0rG

    Both the open tabs and the inbox

  278. MattJ

    That's basically what I'm proposing, yes

  279. Ge0rG

    So we were misunderstanding each other all the time? Because that's what I wanted all along as well

  280. MattJ

    Dave's current proposal appears to me to not define any logic around which JIDs should be included in the result

  281. MattJ

    I'm suggesting we merge the old PEP inbox proposal, and use that as the list of JIDs, plus include any others that have pending unread messages

  282. MattJ

    So you have a single query for all "open" chats and unread messages

  283. MattJ

    I think it's similar to what you/someone suggested earlier about a sticky bit on the JIDs, it's just not clear to me how that would get set, how notifications would get broadcast to other clients on update, etc.

  284. MattJ

    I think PEP is a good mechanism for that part

  285. MattJ

    And that solves my issue too... clients/servers without MAM can still "implement" the open chats part (PEP) without needing to implement the magic query

  286. pep.

    Ge0rG, I still want a list of jids (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world

  287. pep.

    Ge0rG, I still want a list of open tabs (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world

  288. pep.

    Ge0rG, I want a list of open tabs (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world

  289. pep.

    Also.. the last message is probably not useful for some e2ee mechanisms (PFS).

  290. pep.

    Ah nevermind.

  291. pep.

    That would be an unread message :-°