-
pep.
Is there a way to do with pubsub (or else?) many publishers many subscribers, but only subscribers see everything. publishers see their own items
-
dwd
pep., Defining fulltext search fully would mean servers would have to implement a full-text search engine entirely - it wouldn't handle, for example, stemming in a homogeneous manner, so we'd presumably have to ban that, which feels undesirable. AIUI, MattJ's suggestion is a strict substring field as well as a "magic" field. I think the threat of beer-buying is sufficient to prevent outright silliness (it also prevents anyone being silly and still claiming full conformance, BTW).
-
MattJ
dwd, I don't think we have to rule out stemming
-
MattJ
nor mandate it
-
MattJ
(for the "plain" search)
-
MattJ
But most FTS engines provide an advanced query language, and that's mainly what I want to avoid exposing
-
MattJ
But e.g. it sounded like that's what Guus was doing, it's usually the (slightly) easier option
-
dwd
Right, indeed. Your suggestion is a dumb substring search, plus magic. I'd aim for magic first, going that any query language is close to nothing. I'm thinking in terms of tsvector in pgsql, for example.
-
dwd
Unless I misunderstand your suggestion here.
-
MattJ
I'm just saying there should be two fields, plain and implementation-specific
-
MattJ
running with the postgres example, the plain one would use plainto_tsquery() for example
-
dwd
But then your plain one would do stemming for example. Surely?
-
MattJ
Yes
-
MattJ
I don't see that as a problem
-
MattJ
It defines the semantics of the user input, not what the implementation does with that info
-
MattJ
"This is a query from the user with no special operators or syntax"
-
MattJ
Now find some messages
-
MattJ
Which is different to: > <Guus> simple keywords will work, but more elaborate lucene queries too (although you'd need to know the index fields)
-
MattJ
I'm saying we should have a way to expose the elaborate queries (if that's what deployments/implementations really want), but we should also have a safe option
-
MattJ
Safe in the sense that you can just throw some text in there and have a reasonable expectation it will return something useful
-
MattJ
This is from last year, but I just remembered it: https://opensourceconnections.com/blog/2019/05/29/falsehoods-programmers-believe-about-search/
-
MattJ
> When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon
-
MattJ
Though I think my favourite from there is: > A customer using the same query twice expects the same results for both searches
-
dwd
Ah, I see.
-
ralphm
Yeah, search is not trivial.
-
dwd
So if I understand correctly, MattJ is arguing that the second sentence of 3.2 of my protoxep should be in effect reversed, and servers MUST interpret any words or characters as search terms, and not treat them as directives or operators.
-
dwd
I can certainly go that route.
-
MattJ
Depends. You only specified one field, and it depends whether you specified the plain or the non-plain one :)
-
MattJ
Guus wants the non-plain one, and this draft was primarily for him at this point, right? :)
-
Kev
We already implement MAM search, FWIW, and have for years.
-
MattJ
But I wanted to add in a way for the server to convey some help text for the non-plain one
-
MattJ
Also we need to deal with localization in the various parts of this
-
dwd
Yay.
-
MattJ
and that's not easy - it's extremely likely that the server is only going to have FTS for a single locale
-
MattJ
But multiple is of course possible with the right setup, I just don't see many people crazy enough to dedicate the resources to that
-
dwd
Yeah, fine with extending my extension to introduce extensions. I was mostly in my XEP project for inbox and figured I'd knock it this once so as to have something.
-
MattJ
I can contribute the missing parts, I've had this in my head for quite a while, but I'm way too busy this side of FOSDEM
-
dwd
I'm not wed to anything here except the beers.
-
MattJ
"OR an orange juice"?
-
dwd
You don't have to drink the beer. They just have to buy it.
-
Ge0rG
this reminds me of how the formulas in Excel/LibreCalc are language-dependent, so if you work with multiple locales, you always get it wrong
-
dwd
It makes it impossible to claim they meet the standard by syntax alone.
-
Guus
Kev what fields do you use, and what functionality is behind it? My primary motivation was to re-use existing field names if possible, to have overlap.
-
Kev
You expect me to remember things? :o
-
Kev
{ "search", sizeof("search")-1, XEP313_FILTER_TYPE_STRING, "If specified, only return messages that contain each of these words, in any order" }, Seems to be what we're using.
-
MattJ
I think that's a sensible implementation of the "plain" variant
-
Daniel
that's what Conversations' local search does as well; and what I would expect a server side search to do as well
-
jonas’
define "word"
-
jonas’
if I search for arc, will it return messages which contain "search"?
-
jonas’
will it return messages which contain "The word 'arc' may be contained or not"?
-
jonas’
</rambling-about-how-fts-is-not-trivial>
-
dwd
jonas’, Does it matter?
-
jonas’
probably not
-
dwd
jonas’, I mean, does anyone care about replicability of search results between servers? (or between the client's local archive and the server?)
-
dwd
Although there's an argument that it might be useful to have a MAM switch that emits only the ids and not the entire messages in case you already have the data. But that feels like an optimisation for another day.
-
Zash
If you already have the data, can't you just search there directly?
-
jonas’
Zash, look at how slow conversations is since it got a FTS index ;)
-
dwd
Zash, Well, depends on how much of the data you have.
-
Zash
Also there's the thing where, given an id, it's tricky to retrieve that message
-
dwd
Zash, It is?
-
Zash
You can request messages before, or after, but not a specific message by id.
-
dwd
Zash, Oh. Well, that's stupid then.
-
dwd
Zash, So you'd have to ask for one after, then one before the result you get.
-
Zash
Yeah
-
dwd
Yeah, that's daft.
-
Zash
Or one before, then the one after that.
-
Zash
Either may or may not work depending on how many messages you have
-
Zash
Inb4 inventing SQL over XMPP to solve this
-
dwd
Zash, Ask for one before and one after concurrently.
-
dwd
Zash, Then it's only 2RTT to get the message you actually wanted.
-
Guus
Unrelated question: with RSM, the direction in which you page through the resultset doesn't affect what's defined as the 'first' and 'last' element, right?
-
Guus
iow the order of elements on a page does not differ based on the direction that you page through the result set?
-
Zash
Correct
-
Guus
👍 thanks
-
Zash
Which makes it funky to ORDER BY backwards, get the results from SQL backwards, then flip them and send them to the client.
-
flow
Zash, Guus, I'd love to see this written down in xep313 (if it's not already).
-
Zash
Doesn't it say in RSM?
-
flow
ahh, if so, then i guess that is fine too
-
Zash
I suppose it doesn't hurt adding some implementation note about it. Feel free to PR 😉
-
dwd
https://twitter.com/wire/status/1219745367475933185
-
pep.
> dwd> You don't have to drink the beer. They just have to buy it. > It makes it impossible to claim they meet the standard by syntax alone. I claim encumbrance. You don't know how easy it is for them to obtain beer :p
-
jonas’
I claim encumbrance. I reject supporting the beer production.
-
pep.
dwd: interoperability still mandates common wire format doesn't it
-
pep.
re MLS/wire
-
dwd
pep., Ah, yes. I thought it interesting primarily because Wire were pushing MLS as primary marketing. It's more or less finished, but it's got all the heavyweight cryptanalysis to go - roughly at the same stage where some early experimental deployment of TLSv1.3 was happening while till in Draft, about a year or so befroe the RFC was published.
-
flow
pep., MLS-interoperability across federated messaging protocols? I'd expect that to require even more than just a common wire format
-
Zash
A common data model at least, so you can map into whatever format
-
dwd
flow, You could do text message bridging, though. Depends what the goals are.
-
jonas’
depends on where you draw the line around "wire" in "wire format"
-
jonas’
or, what Zash says
-
flow
I wouldn't be suprised if MLS needs to be tightly-coupled with the underlying groupchat mechanism
-
dwd
flow, Prepare to be surprised, then.
-
flow
I am prepared, can I be suprised now?
-
dwd
flow, In principle, if two members of the group attempt to commit at once it could get weird, and the DS is supposed to impose a strict ordering, but XMPP does that anyway so I don't think anything special would be needed.
-
flow
dwd, DS?
-
dwd
flow, Also, "Commit?". Easiest to skim the architecture drafts and get a feel for it.
-
flow
will do
-
dwd
I can probbaly knock together a lightingish talk at the Summit on MLS if there's interest. Not that I'm any kind of cryptographer of course.
-
jonas’
I’d be interested
-
jonas’
reminds me to put me on the list of remote attendants
-
jonas’
and reminds me to allocate a day off
-
Kev
I think Andrew's right, we should use what's already in the most popular XMPP server (although it's 2014 it was added, not 2016) and use MAM search the way M-Link does :)
-
Zash
Popularity contest? I object!
-
dwd
Kev, No idea what you're on about, Openfire's only just adding the feature.
-
Guus
I was contemplating how to put that to words, dwd .
-
Zash
Excuse me, that's the weirdest spelling of Prosody I've seen yet
-
Zash
https://cerdale.zash.se/upload/dHpA6ZKtKtstwlTJ/bild.png
-
jonas’
lol
-
Guus
to be honest, I have no clue how many instances of Openfire are running
-
Guus
We have download stats, and update check stats, which give some indication, but that's about it.
-
dwd
Lots in locked-down enterprise networks connecting to Active Directory, though.
-
jonas’
in the federated world, not many, I think
-
Guus
probably true
-
dwd
jonas’, Still plenty there; I think most of those doing update checks are likely to be federated.
-
jonas’
at least not many seen by s.j.n
-
jonas’
so not many hosting MUC services
-
moparisthebest
Daniel, larma, lovetox, any thoughts on a swap over to finally sending 12-byte IVs ? context: https://github.com/siacs/Conversations/issues/2578
-
MattJ
Relevant: https://github.com/siacs/Conversations/commit/e38a9cd729bfa44d06beb44859516a1eebbb3c92
-
MattJ
(and https://github.com/siacs/Conversations/commit/9af056bb16d7294e427dce2d92944c4d12bd8d0f )
-
Daniel
it will probbaly happen with the next minor release (not bugfix)
-
Daniel
Siskin and profanity are 'fixed' in master
-
Daniel
and we will wait for them to release
-
moparisthebest
aw awesome, going to go ahead and comment on that issue
-
Wojtek
BeagleIM as well (same library as Siskin), should be released soon-ish (depends a bit on Apple)
-
jonas’
cc @ Syndace
-
Syndace
thanks jonas’, was involved in that decision
-
jonas’
ok
-
Daniel
what's the implementation status of bookmarks 2?
-
pep.
After what's been done in the sprint?
-
Daniel
yeah probably not much
-
pep.
the prosody module should be working now
-
pep.
converts between all 3 iirc
-
Link Mauve
Converts from both forms of XEP-0048 to XEP-0402 format, and then lets the old form of XEP-0048 read from the same store.
-
Link Mauve
The PEP form of XEP-0048 is only considered for migration, after which it is left unusable.
-
Link Mauve
This should work fine since clients can’t rely on this PEP form working when XEP-0411 isn’t advertised.
-
Daniel
Yes I actually think that's fine
-
Daniel
I know I was super eager on having migration between old pep and new pep working as well. But I don't really understand why anymore
-
Link Mauve
It is now working anyway. :)
-
Link Mauve
Migration, not concurrent usage.
-
Daniel
Yeah. I meant concurrent usage. But yeah it should be fine.
-
Daniel
You can unload the old module and then load the new and everything should be ok
-
Link Mauve
Yes.
-
Link Mauve
The new module will refuse to get loaded if the first one is in the configuration file.
-
Link Mauve
(Or loaded.)
-
Daniel
Yeah that's cool. Yeah I would like to see a last call on that. Get some more feedback from a wider community and then deploy it.
-
Daniel
So for once we could actually do it properly and have a LC before deployment
-
pep.
What about the extensions proposal from Link Mauve btw? did that progress a bit? Maybe awaiting for a PR?
-
Daniel
The what now?
-
Daniel
The changes to the xep went through
-
pep.
let me grep in the list
-
Link Mauve
pep., which extensions proposal?
-
pep.
yours, to bookmarks2
-
pep.
For stuff like password etc., or else
-
Link Mauve
Ah, the have clients not throw away extensions?
-
pep.
yeah
-
Link Mauve
dwd said he was going to add that to the spec.
-
Link Mauve
IIRC.
-
pep.
ok
-
Daniel
I would actually be cool if we could make Draft before Berlin
-
Link Mauve
+1
-
Daniel
Then we can put the final touches on the implementations in Berlin
-
eevvoor
at the sprint you mean Daniel?
-
Daniel
eevvoor: yes
-
Ge0rG
dwd: is Inbox a sophisticated attempt at testing how many levels deep you can nest a <message> without getting your computer taken away? ;)
-
dwd
That's a cruel and accurate suggestion.
-
dwd
Really, it's a matter of trying to reuse the result from MAM such that things like MAMFC plug into it neatly.
-
dwd
But it did feel a bit nesty. Might be a better way of constructing it by injecting an inbox bit inside the result, perhaps.
-
Ge0rG
maybe I'm just fed up with trying to read nested messages from one-liner XML dumps from my client and server logs
-
Ge0rG
dwd: I don't have a good idea ATM
-
Kev
xmllint --format became my friend years ago, and has remained so since.
-
Kev
Because yes, reading one-line XMPP stanzas gets worse the deeper they go.
-
Ge0rG
Kev: I suppose I need to add a key binding for it to my vim
-
Ge0rG
Kev: it's double nasty in clients that just dump the raw stream instead of individual stanzas, so that your grep dumps a screenful of XML and you need to find the beginning and end of things to be able to xmllint
-
pep.
I'm not sure I understand why <entry> contains the latest message
-
pep.
I mean the whole message
-
Ge0rG
pep.: so that you can show the last message in your chat list
-
pep.
Are you not going to do MAM anyway right after?
-
Kev
No reason to.
-
pep.
To get more than 1 message yes
-
Ge0rG
pep.: you could implement a thin client that only MAMs when you open a tab
-
Kev
^
-
pep.
Ge0rG, sure, and then I just need to do MAM when you open the tab
-
pep.
Because I will do MAM
-
pep.
What I'm interested in inbox is really just the list, because then I know what to fetch via MAM
-
Kev
It's fairly common when rendering an inbox (both in chat clients and elsewhere) to want to show a preview of the most recent message, so including the most recent message would achieve that (without doing 100/200/howevermany individual MAM queriest to get the latest message for each inbox entry).
-
Kev
So it seems useful to me.
-
pep.
yeah maybe.. probably something I'll have to ignore then
-
Ge0rG
pep.: yes.
-
Ge0rG
I still think that poezio should be a fat client, though ;)
-
pep.
Ge0rG, in any case that message is useless to me in poezio
-
pep.
I'll do MAM to sync up with the last known id
-
Kev
The whole fat client/thin client thing I think is only going to be 'resolved' by allowing for both.
-
Ge0rG
Kev: I agre✎ -
Ge0rG
Kev: I agree ✏
-
Kev
In cases where allowing for both is going to mean lots of data being sent that one or the other doesn't want, potentially shoving a bool on a query to exclude the noise might make sense.
-
Ge0rG
I actually have a use-case for both. I want a "fat" poezio on my colo server, with full local logging, and a "thin" MAM-backed one on my laptop when I'm on the go
-
Zash
https://modules.prosody.im/mod_map.html
-
Kev
I don't know if that would add any value to inbox or not, but it's a possibility in general.
-
Zash
dwd, had you seen ↑ ?
-
Kev
Zash: Is that also similar to the unread stuff in bind2?
-
pep.
Ge0rG, both can use MAM
-
Ge0rG
pep.: sure, but in different ways
-
Zash
Kev, yes, it's inspired by that example in bind2
-
Ge0rG
pep.: I want my fat client to do a full MAM sync on startup, and then no more MAM
-
Kev
Where inbox is also related to the unread stuff in bind2 (but none of them quite the same)
-
pep.
Ge0rG, when joining a new channel
-
Ge0rG
startup = new session
-
Ge0rG
pep.: history fetch is often good enough, but yeah, okay
-
Kev
Zash: I wonder if there's a race there, by not doing it during bind, but it looks useful.
-
Kev
Zash: Submit a protoxep?
-
Kev
I do think that server-side tracking of unread per-contact is practically needed, which that doesn't quite do, so it's not a whole solution, I think, but is moving in that direction.
-
Zash
Kev: It's mostly done like that to allow easy testing since I don't have bind2 yet.
-
Kev
Yeah, that one's a bit of an issue :)
-
Ge0rG
as is IM2?
-
Zash
Everything2
-
Zash
XMPP 2.0
-
dwd
Zash, I had seen it, but then forgotten about it.
-
dwd
pep., And yes, you might not always want the entire message, and instead just know there is one with a particular id. Or you might not need inbox at all if you're going to pull the entire MAM archive across anyway.
-
dwd
pep., But lots of existing clients like to list out the conversations, and show a previewish thing of the last message. It's why, for example, Instagram's direct message inbox works in exactly this way.
-
dwd
pep., We have bigger challenges because we have lots of different styles of client in the XMPP world, and need to cater for them all efficiently without precluding any. I'm not trying to claim this is a finished design suitable for all cases.
-
Ge0rG
but it's a very good start
-
Ge0rG
dwd: I think it's missing a notion of "open conversations", which is a good thing to keep around in just this place
-
pep.
dwd, my goal is not to pull the entire MAM history
-
pep.
At least not at first
-
pep.
My goal for the inbox thing is really just to get a list of JIDs to fetch MAM for. If I don't have that then I have to fetch then entire history to know who talked to me as there might be JIDs I don't know of (not MUCs nor roster)
-
Ge0rG
pep.: how is having the last message in the response harmful to that?
-
pep.
Ok it may seem I'm still ranting about that, I'm not
-
Zash
Timestamp and body of last message per contact gets you most of the data you'd need to show a list of recent conversations and can be done with simple MAM. Read status needs more tracking than what at least Prosody has
-
pep.
Zash, that's the thing, you might not be talking only to contacts
-
Zash
s/contact/"with" in MAM terms/
-
pep.
yeah but you need to know who, which is why I like inbox
-
Zash
That MAP thing did that iirc. Wanna be convinced to convert it into mod_inbox? :)
-
Zash
Do we need some XPath-ish MAM search thing like the other example of extended search forms?
-
pep.
would it be possible to make that message optional maybe?
-
pep.
dwd, ^
-
dwd
Sure.
-
dwd
Zash, XPath-based MAM search? Yuck.
-
Ge0rG
pep.: what's your goal with that?
-
pep.
Ge0rG, why are you fighting it that much? That message is not needed in there all the time :x
-
Ge0rG
pep.: I'm not fighting, I'm curious. Every boolean options doubles the number of states you create and have to debug
-
pep.
we're at the protocol level still, I think we can live with one or two more options. We're not doing client UX
-
Ge0rG
pep.: please tell me why that Carbon message isn't displayed on my desktop client.
-
Ge0rG
(yes, this is a protocol question. More than UX at least)
-
pep.
even if that makes things more complex I'm of the opinion that I should be able to choose. if we do one-size-fits-all nobody is going to be happy, or rather, only the golden use case is going to be happy and that's annoying for everybody else
-
Daniel
So you want to make it optional to request or optional to generate?
-
Daniel
Because making it optional to generate would be bad
-
pep.
how bad?
-
pep.
I was mostly thinking "I don't need it, the server doesn't need to send it". Whether it generates it or not (or stores it as is) it's not my problem
-
MattJ
What about deployments without MAM? (e.g. for privacy or resource constraint reasons)
-
pep.
with offline messages?
-
pep.
In any case if the server doesn't keep messages, then it doesn't make sense indeed to force it to return the last one
-
MattJ
My point is mainly that you may want to support inbox on a server that doesn't store messages. It seems to me it would be easier for client devs to deal with no message than no inbox
-
MattJ
Er, I think I'd be fine with "if the server/user has a MAM archive enabled, you must do this"
-
MattJ
Just not with making a hard dependency from Inbox to MAM for deployments that don't want that
-
MattJ
For something that is ultimately a convenience/optimisation feature
-
Ge0rG
So such a server would always return an empty list?
-
MattJ
Ge0rG: I think <entry/> would just be empty
-
Ge0rG
MattJ: what JIDs would you list?
-
MattJ
(which is already valid per the XEP)
-
MattJ
I guess the XEP doesn't really specify what JIDs are in the list
-
MattJ
With previous PEP-based proposals that was trivial, and the list is a list of open tabs/chats
-
pep.
yeah I also liked that
-
MattJ
Opening/closing a chat cleanly mapped to adding/removing from the list
-
MattJ
Now it's a bit ambiguous
-
Ge0rG
May be it can be resolved by adding another flag to the inbox, one that reflects an open chat and is sticky even when there are no unread messages?
-
pep.
MattJ, I'd want a per-client list though :x (or profiles or whatever. I guess per-client is already good)
-
Ge0rG
pep.: if you have it per client, you don't need to sync it to the server
-
pep.
Ge0rG, maybe it's not just a dumb list in PEP that I want. I also do want to know if MAM/offline stuff that somebody that's not in my roster talked to me and that I need to do MAM with it
-
pep.
if I don't have that information right away, I need to fetch the world and I want to avoid that
-
Ge0rG
pep.: do you want that for all remote JIDs or just the ones that your client hasn't heard from or the ones that are new since the last MAM fetch from any of your clients?
-
Ge0rG
I'm trying to determine which sets of information we need for the different use cases and how they overlap
-
MattJ
Why not a special PEP node plus an iq that performs a query basically equivalent to what Dave's proposal has
-
MattJ
For the set of JIDs currently stored
-
MattJ
......plus unreads??
-
Ge0rG
Why not have both in the same IQ?
-
MattJ
Both what?
-
Ge0rG
Both the open tabs and the inbox
-
MattJ
That's basically what I'm proposing, yes
-
Ge0rG
So we were misunderstanding each other all the time? Because that's what I wanted all along as well
-
MattJ
Dave's current proposal appears to me to not define any logic around which JIDs should be included in the result
-
MattJ
I'm suggesting we merge the old PEP inbox proposal, and use that as the list of JIDs, plus include any others that have pending unread messages
-
MattJ
So you have a single query for all "open" chats and unread messages
-
MattJ
I think it's similar to what you/someone suggested earlier about a sticky bit on the JIDs, it's just not clear to me how that would get set, how notifications would get broadcast to other clients on update, etc.
-
MattJ
I think PEP is a good mechanism for that part
-
MattJ
And that solves my issue too... clients/servers without MAM can still "implement" the open chats part (PEP) without needing to implement the magic query
-
pep.
Ge0rG, I still want a list of jids (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world✎ -
pep.
Ge0rG, I still want a list of open tabs (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world✎ ✏ -
pep.
Ge0rG, I want a list of open tabs (1:1/muc/whatever), but since I'll most likely want a different one per client I can indeed implement it locally, and also I want to know who I have to fetch when I was offline, without having to sync the world ✏
-
pep.
Also.. the last message is probably not useful for some e2ee mechanisms (PFS).
-
pep.
Ah nevermind.
-
pep.
That would be an unread message :-°