-
pulkomandy
hello, I am implementing xhtml-im in my client and currently adding hyperlinks management, is there a recommendation for how to handle phishing attempts like <a href="http://evilwebsite.org">http://totally-legit-looking.org</a> ? For example Thunderbird (on the email side) has a dialog offering to use the href url or the one in the text when this happens, do XMPP client have similar checks?
-
jonas’
mind that XHTML-IM is officially deprecated because of how easy it is to shoot yourself in the foot with stuff like this
-
Link Mauve
In poezio we always display both.
-
Link Mauve
Web browsers usually display the target URI in the bottom left (or right depending on where the pointer is), I’d assume this design has been put to the test.
-
Sam
That seems like a bad assumption :) (but I also assume people are used to it at this point, at least even if they never actually look at it and click through anyways)
-
pulkomandy
web browsers used to do this, yes. These days they do everything so that the user never sees an URL :
-
Link Mauve
Uh really?
-
Link Mauve
Firefox still does so.
-
pep.
Yeah I confirm
-
Link Mauve
Sam, do you know how else to handle that?
-
pep.
jonas’, and that's still the only solution to do rich text formatting without polluting body, that's actually implemented :)
-
Sam
Link Mauve: Lower left seems like a good idea to me. In addition if the link text and actual link both appear to be URLs it couldn't hurt to show a big warning as someone suggested.
-
Link Mauve
Indeed.
-
pulkomandy
yes, I don't really care if it's deprecated, it's used by various things I need
-
Sam
Well, it could hurt because in commercial systems everything is always behind a tracking link, but making that more painful won't make me lose much sleep.
-
Sam
(and no one is using XMPP commercially in that sense anyways that I know of; eg. there's no newsletters or anything over XMPP)
-
pulkomandy
yes, funnily I mainly know of Thunderbird doing this because outlook changes the content of emails to redirect everything through some "safe links" system
-
Sam
What things do you need, maybe we can suggest alternatives that don't have such a bad user experience?
-
pulkomandy
and then thunderbird complains that the link doesn't match the text anymore
-
pulkomandy
well the 3 things I saw using xhtml-im so far are: biboumi to forward IRC formatting, some matrix bridge using blockquotes for cited messages, and a notification bot using a href to put links to a forum whenever a message is posted there
-
Link Mauve
And poezio!
-
Sam
For the notification bot I'd start with auto-linking URLs in the plain text body first. That will give you a nice experience on both ends of the connection if users are chatting and I suspect the bot also has a plain text body that will work fine with this
-
pulkomandy
as far as I know, none of the replacements for xhtml-im allow using colors in the text. So they are all worse than IRC...
-
Sam
The other two are harder obviously as they'd need change to the bridges, so maybe we can't solve that problem unfortunately
-
pulkomandy
is there a spec for autolinking urls? Or do I need to figure out my own way to detect URLs?
-
Sam
No, they're better than IRC because people dont' insist on sending you yellow text that looks great against their dark background but can't be read on your light background :)
-
Sam
I'm sure there's a URL detection library out there, but no, there's no documented algorithm for doing so in XEPs at least
-
Sam
But it's a common enough thing that's easy enough to do
-
Link Mauve
[citation required]
-
pulkomandy
still can't be as easy as parsing <a href=""></a>
-
Sam
Maybe, maybe not. It's pretty easy either way.
-
Sam
Anyways, just saying that might go ahead and solve that problem for you and be a useful thing to the users of your client.
-
Sam
I think most people just use a regexp copied from the internet. This will never be 100% correct with no false positives or negatives, but it generally does well enough 99% of the time.
-
Link Mauve
In my experience, it’s very annoying when it doesn’t.
-
Link Mauve
Counting parentheses is one such infuriating example regexp can’t do.
-
pep.
While we could just tell the receiving client it's meant to be a url so that it gets it 100% of the time. But no
-
pep.
Better to get it 99% of the time
-
qy
Perl grammars could though...
-
Link Mauve
:)
-
Link Mauve
Reminds me of that time I tried to implement <a/> using poezio’s paste.
-
Sam
Sure, it's a bit annoying. If you have a nice UI for creating links that you can use definitely add an OOB or something too, but either way for people who just type in mysite.example.com you probably want to autolink that, so you'd likely want to do it either way even if you support XHTML-IM or whatever
-
Link Mauve
But then I hate the timer paste it does, so I fell into the rabbit hole that ncurses doesn’t support the proper bracketed paste…
-
Link Mauve
Sam, wut, no, you definitely don’t.
-
Link Mauve
Some websites try to do so, with hilariously bad results.
-
Sam
If I quickly type, "hey, this video was funny <pastes link>" you don't try to autolink that? Seems like a bad experience. I dunno, Conversations does it and it works pretty well. Not saying it's 100%, sure it's annoying sometimes, but mostly it's a much nicer experience when I can just click on it.
-
Link Mauve
For instance in French we have many words ending in -s if masculine, -es if feminine, and using a dot to mean either undeterministically, these systems always think these are links to Spanish websites. ^^'
-
Sam
Although this is probably more important on Android where you don't have a cursor and can't just copy/paste the text into the address bar
-
Link Mauve
Sam, actually in poezio we don’t control what the terminal will autolink (although I’ve seen a proposal for proper HTML-style links recently, but it is not implemented in tmux…).
-
Sam
Sure, not every possible system can do it.
-
Sam
I'm just saying, if you've got a bot sending you links that might be a good first step.
-
pep.
As a client I'd prefer to tell my terminal what is a link though, because I've got more context than the terminal
-
Link Mauve
Sam, if you quickly type "hey, this video was funny <pastes link>" and your client creates a proper <a/> link on paste, there is no issue and no need for other clients to guess what is or isn’t a URI.
-
Link Mauve
pep., yup.
-
Sam
That's the same thing, your client just had to guess.
-
Sam
Instead of the other side.
-
Link Mauve
pep., https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda
-
pep.
Sam, it wouldn't, if the sender told it.
-
Sam
Sure, but the sender didn't tell it, they just typed in some text.
-
Link Mauve
Sam, as a sender you can fix it until it is correct.
-
Link Mauve
While as a recipient, if the markup is lost then you are condemned to guess.
-
Sam
I'm not saying not to do that; I agree, linkify it on both ends it will make for a way better experience.
-
Link Mauve
Sam, realistically, people very rarely type in URIs in text.
-
pulkomandy
I don't know about your OS, but in mine, the clipboard data has a mimetype so if I copypaste a link into my XMPP client, I can know it's a link, and probably get both the URL and the page title from the clipboard
-
Link Mauve
Copy/paste is a much more common feature for that.
-
Sam
But you probably also have to do it on the receiving end for when they have an old client that doesn't understand your XHTML-IM or OOB/references format or whatever you use anyways if you want a fallback.
-
pulkomandy
yes, I will handle legacy clients and OS as best as I can, but that's not a reason to stay locked in the 1990s
-
pulkomandy
otherwise I would be writing an IRC client not an XMPP one
-
Sam
I didn't say you should, I said it might be a quick way to get that for a simple bot and might be good enough for now since we don't have a good link format.
-
Link Mauve
We actually do, it’s just that you deprecated it.
-
Link Mauve
But it’s still perfectly usable.
-
pulkomandy
yes I'll implement what we have now. I'm happy to replace it with something better if someones comes up with something better, which I don't think the new specs for rich text are
-
nephele
I made a specification for formatted messages in matrix, if there is interest I will work on making a similar one for xmpp, if the concept is considered fine :)
-
nephele
https://github.com/tulir/matrix-doc/blob/formatting-entities/proposals/2427-json-based-message-formatting.md
-
Link Mauve
nephele, XEP-0071 is that but for XMPP.
-
moparisthebest
> Matrix formatting is currently based on a subset of HTML. Sounds like most clients are probably vulnerable to what most xhtml-im clients are vulnerable to
-
nephele
Link mauve: no, that is differnt
-
Zash
nope, because json protects it!
-
nephele
moparisthebest: yes... which is why i made this alternative formt :)
-
Zash
so something between xhtml-im and https://xmpp.org/extensions/xep-0394.html
-
nephele
Eh, not that similar either
-
Zash
actually, closer to xhtml-im
-
Zash
modelled in json
-
nephele
It's not html :) that was the main point anyhow
-
pep.
xhtml-im isn't "html" either
-
pep.
It's a strict subset of xhtml
-
nephele
Yes, but you cannot use an html paraer for this one
-
Zash
It can be translated to HTML, therefore vulnerable.
-
MattJ
Well, anything can be vulnerable
-
pulkomandy
unicode seems more dangerous than html :>
-
Sam
I would assume this would be less likely to be vulnerable?
-
Zash
The vulnerability is the web itself, not the format!
-
moparisthebest
In practice all clients just drop it into a browser that supports JavaScript etc
-
MattJ
I think a custom format that can't be passed to a renderer (i.e. not HTML, XHTML or Markdown) is less likely to cause implementation vulnerabilities
-
moparisthebest
I get that it "can be implemented securely"
-
pulkomandy
that's clearly not the case for the code I'm writing
-
pep.
moparisthebest, web* clients
-
pulkomandy
so "all clients" can't be true :)
-
moparisthebest
No, all
-
pep.
Also not poezio
-
Sam
but yah, adding the massive web footprint and platform is the real problem
-
moparisthebest
If you have a spec that all clients implement in an insecure way, it's a bad spec even if it can be secure in theory
-
pulkomandy
you don't need a web engine for this. I used libcss to parse the css and give me easy to use styling attributes. No HTML parser or DOM or anything crazy like that involved
-
Zash
good luck finding a decent rendering engine that doesn't come with a javascript engine bolted on
-
pulkomandy
you don't need a full rendering engine for this, that's why it's a subset of xhtml and not the full thing
-
MattJ
Ideal would be a new "safe" format, with reference implementations in multiple languages for translation to HTML and other common markup formats
-
Zash
there aren't a lot of rendering engine implementations afaik
-
Link Mauve
Yup, poezio’s rendering engine is decent and is written in about 500 lines of Python.
-
Zash
terminal is easier but turning text into pixels is hard
-
Link Mauve
For most toolkits this is a solved problem though.
-
Zash
... because they include HTML+CSS+JS based rendering engines
-
Link Mauve
Although with resolutions being bigger and bigger, the traditional way is starting to be a bit limited, so newer ways to turn text into pixels (using GPUs this time) are being explored.
-
Link Mauve
Zash, I’m most familiar with GTK, which only includes CSS out of these three, and pango implements a subset of HTML for its markup.
-
Link Mauve
There is no web engine nor JS available in there, without external libraries like webkit2gtk.
-
Link Mauve
Pidgin went for the latter, and this has been a massive drag since then.
-
Zash
Wasn't half of Gnome written in JS these days?
-
pulkomandy
here is my 800 lines of code to implement xhtml-im with just libcss and no javascript or html or dom involved: https://github.com/pulkomandy/Renga/blob/master/ui/Xhtml.cpp most of it is callbacks to tell libcss "no we don't need that here"
-
Link Mauve
Zash, gnome-shell is written in JS, but that’s the host language, not a language you are forced to embed just because.
-
nephele
Anyhow, if there is interest let me know and I'd work on a new format for xmmp
-
Zash
And back when Swift was more actively developed it was said that there weren't any rendering engines available besides webkit
-
moparisthebest
How many vulns is in libcss?
-
pep.
nephele, I honestly recommend fastening up your seatbelt really tight if you go that way in the XMPP world. Haters are gonna hate
-
Zash
We have how many formats already?
-
pep.
2 in use
-
Zash
we have enough war without another format war
-
pep.
Well they're not even the same thing, that's the worst. One is a wire format missing an input format, the other is an input format missing a wire format
-
pep.
Together they could go very far but for some reason one doesn't like the other. I'll let you guess which
-
Link Mauve
pep., probably just nobody did it so far.
-
pep.
Link Mauve, well the latter mandates input format == wire format, so it's not really possible. That's the trick :p
-
Link Mauve
Although I’d rather go for something a bit more widespread, such as Markdown, for such an input format.
-
Link Mauve
pep., not really no, does it?
-
pep.
Isn't that the whole point
-
pep.
of 393
-
Link Mauve
pep., it has some examples of it being used in {jabber:client}body, but that’s just examples, not standard text.
-
Link Mauve
You can perfectly well use that as your input format, and transform it before sending it to the recipients.
-
pep.
Link Mauve, I know, see https://lab.louiz.org/poezio/poezio/-/issues/3455#note_7769
-
Link Mauve
Right.
-
pulkomandy
I'd rather go with https://xmpp.org/extensions/xep-0394.html than 393 if we really have to remove xhtml-im (but again, no support for colors there, yet?)
-
pep.
Reading 393, I just discovered: « Clients that do not support this specification MUST still be able to receive messages sent by clients using this specification and display them in a human-readable form. »
-
pep.
Is that really a thing? a MUST for non-supporting implementations?
-
Link Mauve
pep., it’s mu. :D
-
Link Mauve
A specification can’t force non-implementers to do anything.
-
Sam
Good catch; that's just a requirement, that "MUST" should be "must".
-
Sam
Oh, no, nevermind
-
Sam
But still, it's not a requirement on the clients to do anything, it's a requirement on the spec to do something
-
pep.
Ok
-
moparisthebest
Markdown also requires a browser which in practice always comes with JavaScript
-
Sam
It doesn't require a browser, but in a browser all the markdown libraries I looked at appeared to be vulnerable by default to injecting scripts or something executable which is part of the reason I didn't just go with that when writing 0393.
-
pulkomandy
yes I'm a lot more worried about me trying to write a parser for 0393 than about using libcss for 0071
-
moparisthebest
I wouldn't be
-
Link Mauve
moparisthebest, Markdown is a superset of HTML, it doesn’t “require a browser” nor JavaScript.
-
moparisthebest
Link Mauve: in practice it'll always be implemented that way
-
Link Mauve
moparisthebest, not really no.
-
Sam
I would be interested to see a spec that used XML for formatting similar to XHTML-IM but w/o the HTML part and w/o the "tries to link into the plain text body too" part of 0394. I dunno if it would be better or worse, and you end up with the "plaintext/formatted message bodies are entirely different problem", but I'd like to see it and would be curious what could be done with it.
-
moparisthebest
Again, I don't care what's theoretically possible, only what happens 99.9% of the time
-
Sam
moparisthebest: I don't think that's true, none of the markdown parsers I've ever used required HTML (unless they were javascript ones). I mean, you're right about the problem, just wrong about that detail I think
-
pulkomandy
well, xhtml-im but we do a rot13 on all the xhtml element names to make sure they are not accidentally sent to an html parser?
-
moparisthebest
See also: _xmppconnect and XMPP XML being a "strict subset of XML" where all projects just use an XML parser and are vulnerable
-
Link Mauve
Sam, that would be exactly the same as XHTML-IM imo, clueless webdevs will just make it go through some XSLT or whatever and end up with the exact same vulnerabilities, while you have fragmented the ecosystem with one more wire format.
-
Sam
I'm not 100% sure that's true, but you might be right
-
pulkomandy
clueless webdevs don't know about XSLT, they would implement something similar, but slower in javascript
-
Link Mauve
Right.
-
Link Mauve
Sam, clueless webdevs have vulnerabilities in anything where plain text is used in the protocol, built-in the browser under the name innerHTML.
-
Link Mauve
Once the JS converter to HTML has been passed, they’ll put it in the DOM with innerHTML and get the same vulnerability they’ve used for years.
-
Sam
Yah, actually, you're probably right. The naive case would carry over the attributes and one of those will be javascript:onmouseover or whatever.
-
Link Mauve
Exactly.
-
moparisthebest
You could say the same about clueless C++ devs who think "I'm sure I can write secure c++ *this* time"
-
pep.
Maybe someday we'll stop betting that clueless webdevs be clueless and limit our specs and we'll start helping/training them instead and write our specs with less worries
-
Link Mauve
Ha, I’m not gonna train a webdev.
-
pep.
:D
-
Link Mauve
I’m bad at webdev myself.
-
Link Mauve
Stuck about ten years ago.
-
moparisthebest
You need to write specs that can be implemented securely by anyone that can read them without knowing a ton of non obvious stuff
-
pep.
The point is, if you think people are dumb you're not gonna go very far
-
Link Mauve
pep., their very platform is offering them footguns.
-
pep.
Then let's change the platform
-
Link Mauve
moparisthebest, good luck with that.
-
moparisthebest
Link Mauve: *different footguns
-
Link Mauve
That would be a platform where exactly no wire text is present in the final UI.
-
Link Mauve
For a chat system for instance, you wouldn’t go very far.
-
Sam
It's not that we're just assuming web devs aren't intelligent, it's that literally every web client I ever tried that supported XHTML-IM (and I don't think "every" is me being hyperbolic) had trivial vulnerabilities. Sure, I reached out and helped fix a lot of them, but the point is that experience shows us that we handed them a gun pointed at their foot and then just told them "but be careful and don't pull the trigger"
-
moparisthebest
Have you ever used openssl?
-
pulkomandy
sadly, yes :(
-
pep.
Sam, you're mistaken on the footgun though
-
pep.
There is one in that story for sure
-
moparisthebest
All computer stuff is a dumpster fire, pointing out that different trash is burning on the webdev side vs native code doesn't feel helpful
-
Link Mauve
Sam, our specification might not have carried enough big blinking red warnings, but I’ve found similar vulnerabilities in multiple clients’ handling of MUC nicks, the thing in the resource. :D
-
pulkomandy
also we have specifically said "clueless webdevs" which is a subset of webdevelopers. There are skilled ones too, and there are clueless C++ developers too
-
Link Mauve
It’s explicitly specified as an opaque string.
-
pep.
pulkomandy, agreed
-
Sam
Sure, there are also other vulnerabilities and common problems; that doesn't mean we shouldn't fix the ones that can be fixed.
-
Link Mauve
Removing the ability to send formatted text was never a fix, even less a good one.
-
Sam
No one removed the ability, we obsoleted the spec which means "the XSF doesn't recommend this particular spec".
-
Link Mauve
But we’ve had pages of emails on that topic, let’s not go over them again. :)
-
Link Mauve
Sam, right.
-
pep.
Yeah, pages of feedback on that topic which got ignored
-
pulkomandy
well it seems the result is client devs like me thinking "the XSF is stupid, they don't provide any alternative so I'm going to implement this anyway"
-
Link Mauve
pep., not really, I mean people continue to implement it despite it being obsolete.
-
Sam
It was all discussed multiple times. Just because your way didn't get picked doesn't mean you were ignored.
-
Link Mauve
pulkomandy, that’s approximately my stance on that too.
-
Sam
The XSF isn't some magical body telling you what to do; the council just said "we don't recommend this one because experience has shown us it's difficult to do right". The XSF is *you*, other alternatives could be proposed (like 0393 and 0394). If one of them got implemented and the other didn't, it's the community that voted with their code, not the XSF. And you could always propose another that includes whatever formatting you think is missing
-
Link Mauve
Sam, no need for that, 0071 works.
-
Link Mauve
At best what I’d propose would be some bright blinking red warnings about our implementation experience.
-
pulkomandy
yes, what do we do, resubmit 0071 with a new xep number and rename it "totally-not-xhtml-im" ?
-
Sam
Well, that's fine, but the council at the time disagreed.
-
Sam
In theory the council is experienced people who know a bit about XMPP. That's not to say that every decision will be perfect, and not to say that you can't ignore their warning and go implement it, just that it might be worth considering why they did it and that it wasn't because they ignored you.
-
qy
I like 0394 more but 0393 seems more usable, probably best implement them both
-
pulkomandy
well, there is this spec being used in the wild by at least 4 different xmpp things, there is no replacement (393 and 394 don't implement the two features I need: marking up links so I don't have to guess, and converting IRC styling so that IRC users can smoothly migrate to my client and not lose any features) and I'm not going to spend time writing more specs because I have enough work to do writing code supporting existing stuff. Do whateveryou want with that information :)
-
Sam
(FWIW, I think we need a linking spec in particular and would love to see that exist, I've thought about working on one a few times)
-
qy
i feel like oob is fine for linking, just that it has been implemented in such a wacky way
-
Sam
Maybe I should finish my LaTeX-IM spec. It was meant to be published on April 1st last year, but I never got around to finishing/submitting it.
-
Link Mauve
:D
-
Link Mauve
Reminds me of a Gajim plugin I once wrote, which would render Lilypond markup inline. <3
-
Link Mauve
(0393-style)
-
pep.
Do I need to download a texlive distribution for the LaTeX-IM spec? :P
-
Sam
oooh, I would legit use that, not even as a joke. I used to write a lot of music and I *love* lilypond (even if every release breaks my old stuff and it's really confusing markup for anything more advanced than a simple staff)
-
Link Mauve
(Where a client which didn’t support this markup would still show you the { \treble \time 4/4 c8 d e f g2 }, while a client with support would render a lovely score.
-
Link Mauve
The main issue with that is that Scheme support means you basically own the remote computer.
-
Sam
I left the note on codeblocks undefined in 0393, but I keep hoping clients that implement it will do things like that, eg. gajim might let plugins hook into ```note and if it sees ```lilypond it could try to render it, etc.
-
Sam
But yah, that opens a whole other can of worms.
-
Link Mauve
Preformatted text (<pre><code/></pre> in HTML) is by no means made to actually render or run the thing.
-
Link Mauve
Although you could add a Run button in your client, so that for instance a Python snippet can be executed inline.
-
Link Mauve
Hopefully, only with proper sandboxing in place.
-
pulkomandy
a good way to check if there are also clueless python devs :')
-
Link Mauve
Are you willing to bet on most clients doing security properly? :)
-
qy
> so that for instance a Python snippet can be executed inline.
-
qy
😱️
-
moparisthebest
> Hopefully, only with proper sandboxing in place. You just described all of the web
-
Link Mauve
The web is actually a very good sandbox. :)
-
pulkomandy
but that can't protect a website against itself :)
-
Link Mauve
Actually there are quite a few mechanisms for that, iframe for one, combined with HTTP headers.
-
moparisthebest
CSP?
-
Link Mauve
Yeah.
-
Link Mauve
And a few other ones.
-
moparisthebest
It's just piles upon piles of hacks to try to make it secure
-
moparisthebest
Well, all of computing is
-
qy
some parts more than others though