XSF Discussion - 2020-01-22

  6. pep. Is there a way to do with pubsub (or else?) many publishers many subscribers, but only subscribers see everything. publishers see their own items
  39. stpeter has left
  40. dwd pep., Defining fulltext search fully would mean servers would have to implement a full-text search engine entirely - it wouldn't handle, for example, stemming in a homogeneous manner, so we'd presumably have to ban that, which feels undesirable. AIUI, MattJ's suggestion is a strict substring field as well as a "magic" field. I think the threat of beer-buying is sufficient to prevent outright silliness (it also prevents anyone being silly and still claiming full conformance, BTW).
  49. mukt2 has left
  50. mukt2 has joined
  76. Douglas Terabyte has joined
  125. karoshi has joined
  126. MattJ dwd, I don't think we have to rule out stemming
  127. MattJ nor mandate it
  128. MattJ (for the "plain" search)
  129. MattJ But most FTS engines provide an advanced query language, and that's mainly what I want to avoid exposing
  133. dwd Right, indeed. Your suggestion is a dumb substring search, plus magic. I'd aim for magic first, going that any query language is close to nothing. I'm thinking in terms of tsvector in pgsql, for example.
  134. dwd Unless I misunderstand your suggestion here.
  135. MattJ I'm just saying there should be two fields, plain and implementation-specific
  136. MattJ running with the postgres example, the plain one would use plainto_tsquery() for example
  140. dwd But then your plain one would do stemming for example. Surely?
  141. MattJ Yes
  142. MattJ I don't see that as a problem
  143. MattJ It defines the semantics of the user input, not what the implementation does with that info
  144. MattJ "This is a query from the user with no special operators or syntax"
  145. MattJ Now find some messages
  146. MattJ Which is different to: > <Guus>  simple keywords will work, but more elaborate lucene queries too (although you'd need to know the index fields)
  148. MattJ I'm saying we should have a way to expose the elaborate queries (if that's what deployments/implementations really want), but we should also have a safe option
  149. MattJ Safe in the sense that you can just throw some text in there and have a reasonable expectation it will return something useful
  150. MattJ This is from last year, but I just remembered it: https://opensourceconnections.com/blog/2019/05/29/falsehoods-programmers-believe-about-search/
  151. MattJ > When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon
  152. MattJ Though I think my favourite from there is: > A customer using the same query twice expects the same results for both searches
  157. dwd Ah, I see.
  159. ralphm Yeah, search is not trivial.
  164. dwd So if I understand correctly, MattJ is arguing that the second sentence of 3.2 of my protoxep should be in effect reversed, and servers MUST interpret any words or characters as search terms, and not treat them as directives or operators.
  165. dwd I can certainly go that route.
  166. MattJ Depends. You only specified one field, and it depends whether you specified the plain or the non-plain one :)
  167. MattJ Guus wants the non-plain one, and this draft was primarily for him at this point, right? :)
  168. Kev We already implement MAM search, FWIW, and have for years.
  169. MattJ But I wanted to add in a way for the server to convey some help text for the non-plain one
  170. MattJ Also we need to deal with localization in the various parts of this
  171. dwd Yay.
  172. MattJ and that's not easy - it's extremely likely that the server is only going to have FTS for a single locale
  173. MattJ But multiple is of course possible with the right setup, I just don't see many people crazy enough to dedicate the resources to that
  174. dwd Yeah, fine with extending my extension to introduce extensions. I was mostly in my XEP project for inbox and figured I'd knock it this once so as to have something.
  175. MattJ I can contribute the missing parts, I've had this in my head for quite a while, but I'm way too busy this side of FOSDEM
  176. dwd I'm not wed to anything here except the beers.
  177. MattJ "OR an orange juice"?
  178. dwd You don't have to drink the beer. They just have to buy it.
  179. Ge0rG this reminds me of how the formulas in Excel/LibreCalc are language-dependent, so if you work with multiple locales, you always get it wrong
  180. dwd It makes it impossible to claim they meet the standard by syntax alone.
  184. Guus Kev what fields do you use, and what functionality is behind it? My primary motivation was to re-use existing field names if possible, to have overlap.
  185. Kev You expect me to remember things? :o
  199. !XSF_Martin has joined
  200. jonas’ define "word"
  201. jonas’ if I search for arc, will it return messages which contain "search"?
  202. jonas’ will it return messages which contain "The word 'arc' may be contained or not"?
  203. jonas’ </rambling-about-how-fts-is-not-trivial>
  210. dwd jonas’, I mean, does anyone care about replicability of search results between servers? (or between the client's local archive and the server?)
  211. dwd Although there's an argument that it might be useful to have a MAM switch that emits only the ids and not the entire messages in case you already have the data. But that feels like an optimisation for another day.
  212. Zash If you already have the data, can't you just search there directly?
  217. Zash Also there's the thing where, given an id, it's tricky to retrieve that message
  218. dwd Zash, It is?
  219. Zash You can request messages before, or after, but not a specific message by id.
  220. dwd Zash, Oh. Well, that's stupid then.
  221. dwd Zash, So you'd have to ask for one after, then one before the result you get.
  222. Zash Yeah
  223. dwd Yeah, that's daft.
  224. Zash Or one before, then the one after that.
  225. Zash Either may or may not work depending on how many messages you have
  226. Zash Inb4 inventing SQL over XMPP to solve this
  227. dwd Zash, Ask for one before and one after concurrently.
  228. dwd Zash, Then it's only 2RTT to get the message you actually wanted.
  232. Guus Unrelated question: with RSM, the direction in which you page through the resultset doesn't affect what's defined as the 'first' and 'last' element, right?
  233. Guus iow the order of elements on a page does not differ based on the direction that you page through the result set?
  234. Zash Correct
  235. Guus 👍 thanks
  248. lorddavidiii has joined
  268. jubalh has joined
  269. flow Zash, Guus, I'd love to see this written down in xep313 (if it's not already).
  272. Zash Doesn't it say in RSM?
  273. flow ahh, if so, then i guess that is fine too
  274. lovetox has joined
  275. sonny has left
  276. sonny has joined
  289. jubalh has joined
  290. jubalh has left
  318. Zash I suppose it doesn't hurt adding some implementation note about it. Feel free to PR 😉
  321. dwd https://twitter.com/wire/status/1219745367475933185
  330. sonny has joined
  339. pep. > dwd> You don't have to drink the beer. They just have to buy it. > It makes it impossible to claim they meet the standard by syntax alone. I claim encumbrance. You don't know how easy it is for them to obtain beer :p
  340. jonas’ I claim encumbrance. I reject supporting the beer production.
  341. pep. dwd: interoperability still mandates common wire format doesn't it
  342. pep. re MLS/wire
  343. dwd pep., Ah, yes. I thought it interesting primarily because Wire were pushing MLS as primary marketing. It's more or less finished, but it's got all the heavyweight cryptanalysis to go - roughly at the same stage where some early experimental deployment of TLSv1.3 was happening while till in Draft, about a year or so befroe the RFC was published.
  344. flow pep., MLS-interoperability across federated messaging protocols? I'd expect that to require even more than just a common wire format
  345. mukt2 has joined
  346. Zash A common data model at least, so you can map into whatever format
  347. sonny has left
  348. dwd flow, You could do text message bridging, though. Depends what the goals are.
  349. jonas’ depends on where you draw the line around "wire" in "wire format"
  350. jonas’ or, what Zash says
  351. flow I wouldn't be suprised if MLS needs to be tightly-coupled with the underlying groupchat mechanism
  352. dwd flow, Prepare to be surprised, then.
  353. flow I am prepared, can I be suprised now?
  354. sonny has joined
  355. dwd flow, In principle, if two members of the group attempt to commit at once it could get weird, and the DS is supposed to impose a strict ordering, but XMPP does that anyway so I don't think anything special would be needed.
  356. flow dwd, DS?
  357. dwd flow, Also, "Commit?". Easiest to skim the architecture drafts and get a feel for it.
  358. flow will do
  359. dwd I can probbaly knock together a lightingish talk at the Summit on MLS if there's interest. Not that I'm any kind of cryptographer of course.
  360. jonas’ I’d be interested
  361. jonas’ reminds me to put me on the list of remote attendants
  362. jonas’ and reminds me to allocate a day off
  378. jubalh has joined
  379. mukt2 has left
  380. pdurbin has joined
  381. Kev I think Andrew's right, we should use what's already in the most popular XMPP server (although it's 2014 it was added, not 2016) and use MAM search the way M-Link does :)
  382. pdurbin has left
  390. j.r has left
  391. mukt2 has joined
  392. j.r has joined
  393. Zash Excuse me, that's the weirdest spelling of Prosody I've seen yet
  394. Zash https://cerdale.zash.se/upload/dHpA6ZKtKtstwlTJ/bild.png
  422. SubPub has joined
  423. mukt2 has joined
  424. moparisthebest Daniel, larma, lovetox, any thoughts on a swap over to finally sending 12-byte IVs ? context: https://github.com/siacs/Conversations/issues/2578
  427. MattJ Relevant: https://github.com/siacs/Conversations/commit/e38a9cd729bfa44d06beb44859516a1eebbb3c92
  428. MattJ (and https://github.com/siacs/Conversations/commit/9af056bb16d7294e427dce2d92944c4d12bd8d0f )
  429. Daniel it will probbaly happen with the next minor release (not bugfix)
  430. Daniel Siskin and profanity are 'fixed' in master
  431. Daniel and we will wait for them to release
  432. moparisthebest aw awesome, going to go ahead and comment on that issue
  433. Wojtek BeagleIM as well (same library as Siskin), should be released soon-ish (depends a bit on Apple)
  449. jonas’ cc @ Syndace
  450. mukt2 has joined
  451. stpeter has left
  466. krauq has left
  467. krauq has joined
  482. mukt2 has joined
  483. lorddavidiii has joined
  484. lorddavidiii has left
  485. lorddavidiii has joined
  508. Daniel what's the implementation status of bookmarks 2?
  509. pep. After what's been done in the sprint?
  510. Daniel yeah probably not much
  511. pep. the prosody module should be working now
  512. pep. converts between all 3 iirc
  513. Link Mauve Converts from both forms of XEP-0048 to XEP-0402 format, and then lets the old form of XEP-0048 read from the same store.
  514. mukt2 has left
  515. Link Mauve The PEP form of XEP-0048 is only considered for migration, after which it is left unusable.
  516. Link Mauve This should work fine since clients can’t rely on this PEP form working when XEP-0411 isn’t advertised.
  517. mukt2 has joined
  518. Daniel Yes I actually think that's fine
  519. Daniel I know I was super eager on having migration between old pep and new pep working as well. But I don't really understand why anymore
  520. Link Mauve It is now working anyway. :)
  521. Link Mauve Migration, not concurrent usage.
  522. Daniel Yeah. I meant concurrent usage. But yeah it should be fine.
  523. Daniel You can unload the old module and then load the new and everything should be ok
  524. Link Mauve Yes.
  526. Link Mauve The new module will refuse to get loaded if the first one is in the configuration file.
  527. Link Mauve (Or loaded.)
  528. Daniel Yeah that's cool. Yeah I would like to see a last call on that. Get some more feedback from a wider community and then deploy it.
  529. Daniel So for once we could actually do it properly and have a LC before deployment
  530. pep. What about the extensions proposal from Link Mauve btw? did that progress a bit? Maybe awaiting for a PR?
  531. Daniel The what now?
  532. Daniel The changes to the xep went through
  533. pep. let me grep in the list
  534. Link Mauve pep., which extensions proposal?
  535. pep. yours, to bookmarks2
  536. pep. For stuff like password etc., or else
  537. Link Mauve Ah, the have clients not throw away extensions?
  538. pep. yeah
  539. Link Mauve dwd said he was going to add that to the spec.
  540. Link Mauve IIRC.
  541. pep. ok
  547. eevvoor at the sprint you mean Daniel?
  548. Daniel eevvoor: yes
  549. Ge0rG dwd: is Inbox a sophisticated attempt at testing how many levels deep you can nest a <message> without getting your computer taken away? ;)
  550. dwd That's a cruel and accurate suggestion.
  551. dwd Really, it's a matter of trying to reuse the result from MAM such that things like MAMFC plug into it neatly.
  552. dwd But it did feel a bit nesty. Might be a better way of constructing it by injecting an inbox bit inside the result, perhaps.
  553. Ge0rG maybe I'm just fed up with trying to read nested messages from one-liner XML dumps from my client and server logs
  554. Ge0rG dwd: I don't have a good idea ATM
  555. Kev xmllint --format became my friend years ago, and has remained so since.
  556. Kev Because yes, reading one-line XMPP stanzas gets worse the deeper they go.
  557. Ge0rG Kev: I suppose I need to add a key binding for it to my vim
  561. pep. I'm not sure I understand why <entry> contains the latest message
  562. pep. I mean the whole message
  563. Ge0rG pep.: so that you can show the last message in your chat list
  564. pep. Are you not going to do MAM anyway right after?
  565. Kev No reason to.
  566. pep. To get more than 1 message yes
  567. Ge0rG pep.: you could implement a thin client that only MAMs when you open a tab
  568. Kev ^
  569. pep. Ge0rG, sure, and then I just need to do MAM when you open the tab
  570. pep. Because I will do MAM
  571. pep. What I'm interested in inbox is really just the list, because then I know what to fetch via MAM
  572. Kev It's fairly common when rendering an inbox (both in chat clients and elsewhere) to want to show a preview of the most recent message, so including the most recent message would achieve that (without doing 100/200/howevermany individual MAM queriest to get the latest message for each inbox entry).
  573. Kev So it seems useful to me.
  574. lovetox has joined
  575. lovetox has left
  576. lovetox has joined
  577. pep. yeah maybe.. probably something I'll have to ignore then
  578. Ge0rG pep.: yes.
  579. Ge0rG I still think that poezio should be a fat client, though ;)
  580. pep. Ge0rG, in any case that message is useless to me in poezio
  581. pep. I'll do MAM to sync up with the last known id
  582. Kev The whole fat client/thin client thing I think is only going to be 'resolved' by allowing for both.
  583. Ge0rG Kev: I agre
  584. Ge0rG Kev: I agree
  585. Kev In cases where allowing for both is going to mean lots of data being sent that one or the other doesn't want, potentially shoving a bool on a query to exclude the noise might make sense.
  586. Ge0rG I actually have a use-case for both. I want a "fat" poezio on my colo server, with full local logging, and a "thin" MAM-backed one on my laptop when I'm on the go
  587. Zash https://modules.prosody.im/mod_map.html
  588. Kev I don't know if that would add any value to inbox or not, but it's a possibility in general.
  589. Zash dwd, had you seen ↑ ?
  590. Kev Zash: Is that also similar to the unread stuff in bind2?
  591. pep. Ge0rG, both can use MAM
  592. Ge0rG pep.: sure, but in different ways
  593. Zash Kev, yes, it's inspired by that example in bind2
  594. Ge0rG pep.: I want my fat client to do a full MAM sync on startup, and then no more MAM
  595. Kev Where inbox is also related to the unread stuff in bind2 (but none of them quite the same)
  596. pep. Ge0rG, when joining a new channel
  597. Ge0rG startup = new session
  598. Ge0rG pep.: history fetch is often good enough, but yeah, okay
  599. Kev Zash: I wonder if there's a race there, by not doing it during bind, but it looks useful.
  600. Kev Zash: Submit a protoxep?
  601. sonny has joined
  602. Kev I do think that server-side tracking of unread per-contact is practically needed, which that doesn't quite do, so it's not a whole solution, I think, but is moving in that direction.
  603. Zash Kev: It's mostly done like that to allow easy testing since I don't have bind2 yet.
  604. Kev Yeah, that one's a bit of an issue :)
  619. dwd Zash, I had seen it, but then forgotten about it.
  620. dwd pep., And yes, you might not always want the entire message, and instead just know there is one with a particular id. Or you might not need inbox at all if you're going to pull the entire MAM archive across anyway.
  621. dwd pep., But lots of existing clients like to list out the conversations, and show a previewish thing of the last message. It's why, for example, Instagram's direct message inbox works in exactly this way.
  624. Ge0rG but it's a very good start
  625. pdurbin has joined
  626. Ge0rG dwd: I think it's missing a notion of "open conversations", which is a good thing to keep around in just this place
  628. pep. dwd, my goal is not to pull the entire MAM history
  629. pep. At least not at first
  630. pep. My goal for the inbox thing is really just to get a list of JIDs to fetch MAM for. If I don't have that then I have to fetch then entire history to know who talked to me as there might be JIDs I don't know of (not MUCs nor roster)
  631. pdurbin has left
  632. Ge0rG pep.: how is having the last message in the response harmful to that?
  633. pep. Ok it may seem I'm still ranting about that, I'm not
  634. Zash Timestamp and body of last message per contact gets you most of the data you'd need to show a list of recent conversations and can be done with simple MAM. Read status needs more tracking than what at least Prosody has
  635. pep. Zash, that's the thing, you might not be talking only to contacts
  636. Zash s/contact/"with" in MAM terms/
  637. pep. yeah but you need to know who, which is why I like inbox
  638. Zash That MAP thing did that iirc. Wanna be convinced to convert it into mod_inbox? :)
  646. Zash Do we need some XPath-ish MAM search thing like the other example of extended search forms?
  647. pep. would it be possible to make that message optional maybe?
  648. pep. dwd, ^
  650. dwd Sure.
  652. dwd Zash, XPath-based MAM search? Yuck.
  653. Ge0rG pep.: what's your goal with that?
  654. pep. Ge0rG, why are you fighting it that much? That message is not needed in there all the time :x
  655. Ge0rG pep.: I'm not fighting, I'm curious. Every boolean options doubles the number of states you create and have to debug
  656. pep. we're at the protocol level still, I think we can live with one or two more options. We're not doing client UX
  657. sonny has joined
  658. Ge0rG pep.: please tell me why that Carbon message isn't displayed on my desktop client.
  659. Ge0rG (yes, this is a protocol question. More than UX at least)
  660. pep. even if that makes things more complex I'm of the opinion that I should be able to choose. if we do one-size-fits-all nobody is going to be happy, or rather, only the golden use case is going to be happy and that's annoying for everybody else
  666. Daniel So you want to make it optional to request or optional to generate?
  667. Daniel Because making it optional to generate would be bad
  668. pep. how bad?
  669. pep. I was mostly thinking "I don't need it, the server doesn't need to send it". Whether it generates it or not (or stores it as is) it's not my problem
  670. !XSF_Martin has joined
  671. jubalh has left
  672. jubalh has joined
  674. MattJ What about deployments without MAM? (e.g. for privacy or resource constraint reasons)
  675. waqas has left
  676. mukt2 has left
  677. pep. with offline messages?
  678. waqas has joined
  679. pep. In any case if the server doesn't keep messages, then it doesn't make sense indeed to force it to return the last one
  680. andrey.g has joined
  681. mukt2 has joined
  682. MattJ My point is mainly that you may want to support inbox on a server that doesn't store messages. It seems to me it would be easier for client devs to deal with no message than no inbox
  683. MattJ Er, I think I'd be fine with "if the server/user has a MAM archive enabled, you must do this"
  708. j.r has left
  709. Ge0rG pep.: if you have it per client, you don't need to sync it to the server
  710. j.r has joined
  711. pep. Ge0rG, maybe it's not just a dumb list in PEP that I want. I also do want to know if MAM/offline stuff that somebody that's not in my roster talked to me and that I need to do MAM with it
  712. pep. if I don't have that information right away, I need to fetch the world and I want to avoid that
  713. pdurbin has left
  714. Ge0rG pep.: do you want that for all remote JIDs or just the ones that your client hasn't heard from or the ones that are new since the last MAM fetch from any of your clients?
  715. Ge0rG I'm trying to determine which sets of information we need for the different use cases and how they overlap
  716. sonny has left
  717. mukt2 has left
  718. Nekit has left
  719. mukt2 has joined
  722. MattJ Why not a special PEP node plus an iq that performs a query basically equivalent to what Dave's proposal has
  723. MattJ For the set of JIDs currently stored
  724. MattJ ......plus unreads??
  731. MattJ Both what?
  733. Ge0rG Both the open tabs and the inbox
  734. mukt2 has left
  735. Yagiza has left
  750. MattJ That's basically what I'm proposing, yes
  752. waqas has joined
  757. Ge0rG So we were misunderstanding each other all the time? Because that's what I wanted all along as well
  760. MattJ Dave's current proposal appears to me to not define any logic around which JIDs should be included in the result
  762. MattJ I'm suggesting we merge the old PEP inbox proposal, and use that as the list of JIDs, plus include any others that have pending unread messages
  763. MattJ So you have a single query for all "open" chats and unread messages
  764. MattJ I think it's similar to what you/someone suggested earlier about a sticky bit on the JIDs, it's just not clear to me how that would get set, how notifications would get broadcast to other clients on update, etc.
  765. MattJ I think PEP is a good mechanism for that part
  766. sonny has joined
  767. MattJ And that solves my issue too... clients/servers without MAM can still "implement" the open chats part (PEP) without needing to implement the magic query
  779. pep. Ge0rG, I still want a list of jids (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world
  781. pep. Ge0rG, I still want a list of open tabs (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world
  783. pep. Ge0rG, I want a list of open tabs (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world
  791. pep. Also.. the last message is probably not useful for some e2ee mechanisms (PFS).
  792. pep. Ah nevermind.
  793. pep. That would be an unread message :-°
