-
sunglocto
can somebody link the XEP for link previews?
-
MattJ
https://xmpp.org/extensions/xep-0511.html
-
sunglocto
Much appreciated
-
moparisthebest
ironically sent without a preview š
-
MattJ
Hmm, Prosody module time?
-
moparisthebest
always has been
-
singpolyma
Yeah I didn't get to that module yet but it's an obvious one to do I think
-
Kev
I need to reply to that thread. This would be much better as a reference, so that MUC modules can fetch a non-blocking preview to add.
-
singpolyma
What do you mean a reference?
-
moparisthebest
it's best if clients send it inline no?
-
Kev
> it's best if clients send it inline no? Well, thatās a matter of trust, right? Thereās certainly cases where Iām in chats where Iād trust the MUC server, but not all the people in the MUC.
-
singpolyma
Well. Server generated is actually "best" except of course doesn't work with e2ee and then you're waiting fork servers
-
Kev
> What do you mean a reference? I meant a Reference in the XEP sense, but really the format isnāt as important to me as just that you can later attach the link data to a previous message.
-
Kev
Because for me, servers adding the data is best in non-encrypted environments, but blocking the message send for link resolution would be horrible.
-
singpolyma
Hmm. Usually we've moved away from later attaching stuff. Though in this case you could just do it since the metadata is for the link and not for the message. Would make things more complex on the client though
-
singpolyma
My MUCs block message sends for 10MB media downloads now, heh. Not terrible yet
-
Kev
Maybe a Reference isnāt best, and some simpler āThis URL -> This dataā linkage is easier/better, I just say References as we already have them and they would fit.
-
singpolyma
yeah this url -> this data is how the xep works now. I just imply in the drafting that it should be inline but technically not required
-
moparisthebest
I'm not sure I agree servers fetching and parsing arbitrary html is actually desireable. As a fun optional module until client support is more widespread sure. As a forever requirement not so much.
-
Kev
Oh, you mean rdf:about but in a later message? Yeah, if thatās specced/understood, that likely works.
-
singpolyma
i don't see how references would do as you say. Those go online themselves usually
-
singpolyma
> Oh, you mean rdf:about but in a later message? Yeah, if thatās specced/understood, that likely works. Yeah ↺
-
singpolyma
> I'm not sure I agree servers fetching and parsing arbitrary html is actually desireable. As a fun optional module until client support is more widespread sure. As a forever requirement not so much. It's the best for privacy and security. Except e2ee incompatible ↺
-
Kev
I think ābestā depends heavily on environment/usage. If you only communicate with trusted contacts, then resolving sender-side has a lot going for it.
-
Kev
In public MUCs it has some potential issues
-
singpolyma
The sender has to fetch though. which has privacy and security implications for the sender
-
Kev
Yes. Although theyāre the one that cares to send the link, so maybe thatās ok?
-
moparisthebest
>> I'm not sure I agree servers fetching and parsing arbitrary html is actually desireable. As a fun optional module until client support is more widespread sure. As a forever requirement not so much. > > It's the best for privacy and security. Except e2ee incompatible as the recipient and/or server admin it's strictly worse for security and privacy no? ↺
-
singpolyma
indeed maybe. Which is the argument. But server is a nice case when possible
-
Kev
Not if the sender is untrusted.
-
moparisthebest
the sender I assume has already fetched the link before sending
-
Kev
If you donāt trust the sender to send a legitimate linkdata for the provided link, then sender-provided is bad.
-
Kev
Receiver-fetched is problematic if you donāt trust the sender not to be harvesting data on you.
-
singpolyma
>>> I'm not sure I agree servers fetching and parsing arbitrary html is actually desireable. As a fun optional module until client support is more widespread sure. As a forever requirement not so much. >> >> It's the best for privacy and security. Except e2ee incompatible > > as the recipient and/or server admin it's strictly worse for security and privacy no? I don't think server admin privacy is much impacted. Security certainly but severs are in need of securing their networks generally to begin with ↺
-
moparisthebest
if I don't trust them I'm not clicking the link and the metadata doesn't matter
-
Kev
Server-fetched is work for the server, but isnāt leaking recipient information.
-
Kev
> if I don't trust them I'm not clicking the link and the metadata doesn't matter I think there may be some evidence that is a smarter stance than some of the population has :)
-
singpolyma
Indeed. Most people are happy to click any link ever
-
moparisthebest
yes I agree as a recipient the server fetching it is better than me fetching it
-
snit
real ones click every link as soon as they see it B)
-
singpolyma
anyway the xep intentionally supports both
-
moparisthebest
but as a server admin maybe I'd prefer not to even have an html parser as part of my attack surface...
-
Kev
It wasnāt clear to me that the intention was to support both, Iāll have to reread it.
-
moparisthebest
everything is tradeoffs all the way down
-
Kev
It is, so supporting both models seems reasonable to me.
-
Kev
Yes, the spec does clearly say it supports both if I bother to read more carefully. Maybe an example of the server case would drive that home for the stupid amongst me.
-
distaza
Unless there's a vulnerability in something like sed, I doubt a 'parser' would be that much to be scared of.
-
distaza
Just snip the <head> line and cut the <head> </head> off, otherwise, don't return a preview line, right? If you want to get fancy, you can try to extract some text, but XMPP is already an XML parsing operation.
-
vpzom
HTML is (usually) not XML though
-
distaza
It stil follows convention in spots that are relevant, and if it doesn't, catch it, stop and don't send a preview.
-
luca
Does XMPP use XML? I thought it used a subset of XML with less features
- distaza sighs.
-
distaza
Look, what particular flavor of the text being parsed isn't the point I'm trying to make, it's that we're parsing intelligible data streams, and if they are not in fact intelligible, it's trivial to just stop
-
distaza
So the only 'vulnerability' from receiving it is being able to truncate the received data if it exceeds a certain size and not continuing if a regex is not matched
-
distaza
and that's it
-
distaza
Besides connection specifics (TLS, TCP) which are already in place, because XMPP uses TCP and TLS
-
distaza
And again, vulnerabilities in those? Bigger fish to fry
-
lovetox
I don't see the benefit of a server doing the query, except for clients who cannot so it themselves
-
vpzom
some users don't want random servers to see their local IP address
-
vpzom
and it reduces traffic on the target URL
-
MattJ
Except they aren't random if it's a link from the sender
-
lovetox
? Then don't send a link
-
vpzom
ah true I forgot we were talking about senders
-
lovetox
You send links to contacts that you are scared yourself to open?
-
distaza
I think it's subjective. If you don't want to reveal your IP address, you can just proxy the previews on the client side to begin with.
-
distaza
But having the ability to do these things does not hurt, either way.
-
distaza
People can choose to enable or disable it for themselves, just as they can choose to enable or disable previews entirely on their own machines.
-
distaza
Unless of course you would prefer the users not have a choice.
-
jjj333_p (any pronouns)
fwiw even for """privacy apps""" sender side is generally the norm
-
jjj333_p (any pronouns)
signal does this
-
vpzom
it pretty much has to be if you're doing E2EE
-
jjj333_p (any pronouns)
whatsapp and imessage seem to do it sending side but proxied im not sure what thats about
-
jjj333_p (any pronouns)
> it pretty much has to be if you're doing E2EE on matrix they just disable it by default in encrypted rooms lol ↺
-
jjj333_p (any pronouns)
terrible ux, but c'est la vie
-
vpzom
if we didn't have to worry about E2EE, then as a recipient I would prefer that _my_ server do the preview fetching
-
vpzom
but I'm sure the link host would prefer you only fetch it once :p
-
jjj333_p (any pronouns)
I also feel like proxied media might be nice, if done more intentional than matrix did it
-
distaza
I would prefer if I had an operating system service that allowed me to 'preview' arbitrary text, but that's neither here nor there
-
distaza
That gets into a whole 'nother can of worms
-
distaza
In any case, I think XMPP having this ability, in each of these ways, is just as well
-
distaza
People can choose which version suits their needs
-
CVEIsEternal
Quick experiment: guess whether I am human or AI. First correct guess wins. Timer starts now.
-
CVEIsEternal
Hint: my phrasing is usually too clean. Human or AI?
-
distaza
Decode this set of X86 opcodes: B400 CD16 B40E CD10 EBF6
-
distaza
otherwise, I don't really care, man
-
CVEIsEternal
That loop waits for a keypress and echoes it forever in BIOS text mode: B4 00 = mov ah,0; CD 16 = int 16h (read key) B4 0E = mov ah,0Eh; CD 10 = int 10h teletype output of AL EB F6 = jump back and repeat.
-
CVEIsEternal
Still taking guesses: human or AI?
š 1 -
distaza
This isn't the place for playing guessing games, this is a place to discuss Jabber/XMPP development
š 1 -
distaza
And talk about proof-of-concepts, share implementations... in short, do work
-
distaza
If you have an actual desire that has anything to do with that, go for it, otherwise, I don't care. Can't speak for the rest of the room, but there you go.
-
Cynthia
> some users don't want random servers to see their local IP address If you send a link in a MUC PM ↺
-
Cynthia
Who will proxy it?
-
Cynthia
The MUC server? Or the receiver's server?
-
Cynthia
To be honest, it would be much better (and less of a point of failure) to make the user use Tor or something
-
Cynthia
I've had many problems in Matrix, where I couldn't view any media whatsoever because my server's media proxy decided to die for no reason
-
singpolyma
Related to this I recently wrote https://modules.prosody.im/mod_muc_cache_media.html
-
moparisthebest
> Unless there's a vulnerability in something like sed, I doubt a 'parser' would be that much to be scared of. distaza: what about the seemingly neverending string of vulns in basically all html parsers though? ↺
-
Cynthia
Speaking of, do XMPP servers or clients ever get fuzzed or whatever?
-
distaza
You don't need to 'parse HTML' to make a header, not the way you think.
-
Cynthia
Sending any malformed data to the program to handle
-
distaza
You need to send a single GET, with a user agent on it, and take the response, strip the head and at most a single line of text from the body, and that's it
-
distaza
Stop thinking like someone who is gonna use a dependency or a library and think of it like an engineer
-
distaza
Yes, there might be a vuln in your kitchen sink library, so don't use a kitchen sink library
-
vpzom
in favor of having strictly wrong behaviour?
-
Cynthia
> You need to send a single GET, with a user agent on it, and take the response, strip the head and at most a single line of text from the body, and that's it To be honest, HTML parsers also have to handle (and correct) malformed HTML ↺
-
distaza
Wrong in what sense?
-
Cynthia
Try feeding it unclosed tags, invalid attributes, etc.
-
Cynthia
It'll fix those the best it can
-
Cynthia
And thus, the Web is used to this
-
distaza
As far as I'm aware, you don't need to implement an entire webview to fetch a preview
-
distaza
That's a terrible way to go about this
-
Cynthia
It's not just the act of "parsing" HTML
-
Cynthia
If you snip <head>...</head> off, I can just insert scripts and stuff in the body :P
-
vpzom
> As far as I'm aware, you don't need to implement an entire webview to fetch a preview no one said that ↺
-
distaza
OK, let's suppose you want to use HTML 'correctly', as you say. Let's look at wget! https://www.cvedetails.com/version-list/72/332/1/GNU-Wget.html?order=0
-
distaza
These are the latest documented vulns.
-
distaza
At this point this is hemming and hawing about what components to use, and 'I don't want vulns' influencing whether you want to implement something or not, but not actually the meat of said implementation.
-
distaza
I could tell you what I would do, which is try to GET something from a link that supports it, via a perfectly valid, but mostly static, GET request, as specified in HTTP.
-
Cynthia
Just offload parsing to a child process the program runs
-
Cynthia
And lock it down as much as you can
-
Cynthia
(namespace, seccomp syscall filter, etc. etc.)
-
distaza
Then I take the response, no matter what it is, search for <head> and some body text, and cut it out for the preview, otherwise, do nothing. Insert your own regexes here as desired.
-
distaza
The only way you can argue vulns in that is the program doing TLS and the regexes.
-
distaza
If you *have* a vuln in that, well holy smokes batman you have some serious problems.
- Cynthia inserts some JS code into onload of the html
-
distaza
I mean, I'm pretty sure that's a joke, but if the JS is regexed and nothing more, nothing should happen.
-
distaza
Because there's no JS engine to run it.
-
distaza
DoS can be mitigated by simply truncating the connection after X amount of bytes.
-
Cynthia
I can make HTML so malformed that your regex doesn't match it, but the webview's HTML parser will automatically correct it into what I want
-
distaza
Is that 'correct HTTP'? Hell no. Is it desired? Hell yes.
-
distaza
> [22:20:14] <Cynthia> I can make HTML so malformed that your regex doesn't match it, but the webview's HTML parser will automatically correct it into what I want That's the thing though, I'm arguing for a 'preview' that snips the header text and some body text. You can throw a sanitizer on top to remove special characters, since this needs to be XML compliant anyway.
-
distaza
This is your standard link-bot type preview, where it pastes the tab title and a brief description of the site text.
-
distaza
I 100% agree that any more complex 'previews' amount to visiting the website itself and that can be done by the client. Especially if it involves actually displaying HTML, not just pulling text.
-
distaza
The most this thing should do is pull the first header, the first image, the first embedded video, etc etc, on the page.
-
Cynthia
If its for previews, just use the <meta> tags
-
snit
might be worth noting that the original context of 0511 is for opengraph-based url previews, which only requires parsing specific `meta` elements in the `head`
-
Cynthia
Nothing else
-
distaza
Looking at the spec, yeah, this seems decent as is
-
distaza
I really don't see this XEP as being significantly costly or dangerous, regardless of whether it's the client or server doing it, provided there is a point at which either one will truncate the connection to prevent a DoS
-
snit
my understanding is that regex-based matching can't be reliably used for xml, let alone html, and you'd need to parse _html_ into a different _xml_ structure here anyways, so i think you'd kind of need an html parser unless you're ready to abort at the slightest hint of unexpected input, of which much of the unexpected input is perfectly valid had you parsed it correctly
-
distaza
I'd be willing to abort. If I need a more advanced parser, I would prefer it as an option. I also think that 'HTML parser' in this context is a little open ended, especially with regard to vulnerabilities.
-
distaza
I would prefer a more concrete idea of what is considered a 'valid failure' before increasing scope.
-
distaza
In other words, 'use the thing that solves the problems' is not as helpful to me as 'problem X happens for Y reason that Z parser solves'.
-
distaza
Especially with this being a hypothetical already.
-
distaza
I'm sure that if any of us actually implemented this XEP we would quickly come to grips with potential issues in real time.
-
distaza
We could make those determinations then.
-
distaza
It's important to note that I don't want to handle all HTML here, only a subset, the part I'm looking for. Adding functionality to parse an entire document allows the sender to force me to process the document as such, when that's not what that code should be doing - it should only process the parts I am looking for, and not other things. As such, I don't need a 'correct' parser, I need a partial parser that is correct in the part it processes.
-
distaza
I need a scoped parser.
-
distaza
Or at least, I'd want one.
-
snit
i think the main idea is people expect a solution that actually works with all valid input, rather than just a subset, but doing so requires introducing more code to handle all the valid html cases that aren't valid xml, which introduces more room for vulnerabilities that they just wouldn't have to deal with if they let someone else do it. in which case, the problem is who should be responsible for it, which seems to be more in line with the original topic of discussion
-
distaza
Whoever implements it and whoever runs the implementation.
-
distaza
The spec does not pick, and I think that's fine. Each server and user can pick themselves as long as they follow the specification.
-
vpzom
and the specification in this case is HTML
-
distaza
Huh? I mean the XEP.
-
snit
> It's important to note that I don't want to handle all HTML here, only a subset, the part I'm looking for. Adding functionality to parse an entire document allows the sender to force me to process the document as such, when that's not what that code should be doing - it should only process the parts I am looking for, and not other things. As such, I don't need a 'correct' parser, I need a partial parser that is correct in the part it processes. i don't think the head and body elements get parsed any differently, though; you'd need a complete html parser to actually parse either one, no? ↺
-
distaza
I'll get back to you when I have one :p
-
distaza
There are head attributes that are special.
-
singpolyma
> I can make HTML so malformed that your regex doesn't match it, but the webview's HTML parser will automatically correct it into what I want Why would you go anywhere near a WebView if you just want the content. No. Even if you want a real HTML5 parser fine there are several options. Don't use a WebView yikes ↺
-
snit
> There are head attributes that are special. special as in containing unique syntax valid in the head that isn't valid elsewhere in html and xml? ↺
-
singpolyma
HTML syntax is pretty boring and well defined these days. The official parser spec gives you a parser that works on arbitrary tag soup
-
Cynthia
But you have to account for malformed HTML
-
Cynthia
Which real HTML parsers do anyway✎ -
Cynthia
Which real HTML5 parsers do anyway ✏
-
singpolyma
It's not even really malformed since the HTML5 spec accounts for it
-
Cynthia
Really
-
Cynthia
Even unclosed tags/quotes, invalid attributes and stuff?
-
singpolyma
Yes. It's all part of the HTML5 parser algorithm spec