jdev - 2022-02-26


  1. pulkomandy

    hello, I am implementing xhtml-im in my client and currently adding hyperlinks management, is there a recommendation for how to handle phishing attempts like <a href="http://evilwebsite.org">http://totally-legit-looking.org</a> ? For example Thunderbird (on the email side) has a dialog offering to use the href url or the one in the text when this happens, do XMPP client have similar checks?

  2. jonas’

    mind that XHTML-IM is officially deprecated because of how easy it is to shoot yourself in the foot with stuff like this

  3. Link Mauve

    In poezio we always display both.

  4. Link Mauve

    Web browsers usually display the target URI in the bottom left (or right depending on where the pointer is), I’d assume this design has been put to the test.

  5. Sam

    That seems like a bad assumption :) (but I also assume people are used to it at this point, at least even if they never actually look at it and click through anyways)

  6. pulkomandy

    web browsers used to do this, yes. These days they do everything so that the user never sees an URL :

  7. Link Mauve

    Uh really?

  8. Link Mauve

    Firefox still does so.

  9. pep.

    Yeah I confirm

  10. Link Mauve

    Sam, do you know how else to handle that?

  11. pep.

    jonas’, and that's still the only solution to do rich text formatting without polluting body, that's actually implemented :)

  12. Sam

    Link Mauve: Lower left seems like a good idea to me. In addition if the link text and actual link both appear to be URLs it couldn't hurt to show a big warning as someone suggested.

  13. Link Mauve

    Indeed.

  14. pulkomandy

    yes, I don't really care if it's deprecated, it's used by various things I need

  15. Sam

    Well, it could hurt because in commercial systems everything is always behind a tracking link, but making that more painful won't make me lose much sleep.

  16. Sam

    (and no one is using XMPP commercially in that sense anyways that I know of; eg. there's no newsletters or anything over XMPP)

  17. pulkomandy

    yes, funnily I mainly know of Thunderbird doing this because outlook changes the content of emails to redirect everything through some "safe links" system

  18. Sam

    What things do you need, maybe we can suggest alternatives that don't have such a bad user experience?

  19. pulkomandy

    and then thunderbird complains that the link doesn't match the text anymore

  20. pulkomandy

    well the 3 things I saw using xhtml-im so far are: biboumi to forward IRC formatting, some matrix bridge using blockquotes for cited messages, and a notification bot using a href to put links to a forum whenever a message is posted there

  21. Link Mauve

    And poezio!

  22. Sam

    For the notification bot I'd start with auto-linking URLs in the plain text body first. That will give you a nice experience on both ends of the connection if users are chatting and I suspect the bot also has a plain text body that will work fine with this

  23. pulkomandy

    as far as I know, none of the replacements for xhtml-im allow using colors in the text. So they are all worse than IRC...

  24. Sam

    The other two are harder obviously as they'd need change to the bridges, so maybe we can't solve that problem unfortunately

  25. pulkomandy

    is there a spec for autolinking urls? Or do I need to figure out my own way to detect URLs?

  26. Sam

    No, they're better than IRC because people dont' insist on sending you yellow text that looks great against their dark background but can't be read on your light background :)

  27. Sam

    I'm sure there's a URL detection library out there, but no, there's no documented algorithm for doing so in XEPs at least

  28. Sam

    But it's a common enough thing that's easy enough to do

  29. Link Mauve

    [citation required]

  30. pulkomandy

    still can't be as easy as parsing <a href=""></a>

  31. Sam

    Maybe, maybe not. It's pretty easy either way.

  32. Sam

    Anyways, just saying that might go ahead and solve that problem for you and be a useful thing to the users of your client.

  33. Sam

    I think most people just use a regexp copied from the internet. This will never be 100% correct with no false positives or negatives, but it generally does well enough 99% of the time.

  34. Link Mauve

    In my experience, it’s very annoying when it doesn’t.

  35. Link Mauve

    Counting parentheses is one such infuriating example regexp can’t do.

  36. pep.

    While we could just tell the receiving client it's meant to be a url so that it gets it 100% of the time. But no

  37. pep.

    Better to get it 99% of the time

  38. qy

    Perl grammars could though...

  39. Link Mauve

    :)

  40. Link Mauve

    Reminds me of that time I tried to implement <a/> using poezio’s paste.

  41. Sam

    Sure, it's a bit annoying. If you have a nice UI for creating links that you can use definitely add an OOB or something too, but either way for people who just type in mysite.example.com you probably want to autolink that, so you'd likely want to do it either way even if you support XHTML-IM or whatever

  42. Link Mauve

    But then I hate the timer paste it does, so I fell into the rabbit hole that ncurses doesn’t support the proper bracketed paste…

  43. Link Mauve

    Sam, wut, no, you definitely don’t.

  44. Link Mauve

    Some websites try to do so, with hilariously bad results.

  45. Sam

    If I quickly type, "hey, this video was funny <pastes link>" you don't try to autolink that? Seems like a bad experience. I dunno, Conversations does it and it works pretty well. Not saying it's 100%, sure it's annoying sometimes, but mostly it's a much nicer experience when I can just click on it.

  46. Link Mauve

    For instance in French we have many words ending in -s if masculine, -es if feminine, and using a dot to mean either undeterministically, these systems always think these are links to Spanish websites. ^^'

  47. Sam

    Although this is probably more important on Android where you don't have a cursor and can't just copy/paste the text into the address bar

  48. Link Mauve

    Sam, actually in poezio we don’t control what the terminal will autolink (although I’ve seen a proposal for proper HTML-style links recently, but it is not implemented in tmux…).

  49. Sam

    Sure, not every possible system can do it.

  50. Sam

    I'm just saying, if you've got a bot sending you links that might be a good first step.

  51. pep.

    As a client I'd prefer to tell my terminal what is a link though, because I've got more context than the terminal

  52. Link Mauve

    Sam, if you quickly type "hey, this video was funny <pastes link>" and your client creates a proper <a/> link on paste, there is no issue and no need for other clients to guess what is or isn’t a URI.

  53. Link Mauve

    pep., yup.

  54. Sam

    That's the same thing, your client just had to guess.

  55. Sam

    Instead of the other side.

  56. Link Mauve

    pep., https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda

  57. pep.

    Sam, it wouldn't, if the sender told it.

  58. Sam

    Sure, but the sender didn't tell it, they just typed in some text.

  59. Link Mauve

    Sam, as a sender you can fix it until it is correct.

  60. Link Mauve

    While as a recipient, if the markup is lost then you are condemned to guess.

  61. Sam

    I'm not saying not to do that; I agree, linkify it on both ends it will make for a way better experience.

  62. Link Mauve

    Sam, realistically, people very rarely type in URIs in text.

  63. pulkomandy

    I don't know about your OS, but in mine, the clipboard data has a mimetype so if I copypaste a link into my XMPP client, I can know it's a link, and probably get both the URL and the page title from the clipboard

  64. Link Mauve

    Copy/paste is a much more common feature for that.

  65. Sam

    But you probably also have to do it on the receiving end for when they have an old client that doesn't understand your XHTML-IM or OOB/references format or whatever you use anyways if you want a fallback.

  66. pulkomandy

    yes, I will handle legacy clients and OS as best as I can, but that's not a reason to stay locked in the 1990s

  67. pulkomandy

    otherwise I would be writing an IRC client not an XMPP one

  68. Sam

    I didn't say you should, I said it might be a quick way to get that for a simple bot and might be good enough for now since we don't have a good link format.

  69. Link Mauve

    We actually do, it’s just that you deprecated it.

  70. Link Mauve

    But it’s still perfectly usable.

  71. pulkomandy

    yes I'll implement what we have now. I'm happy to replace it with something better if someones comes up with something better, which I don't think the new specs for rich text are

  72. nephele

    I made a specification for formatted messages in matrix, if there is interest I will work on making a similar one for xmpp, if the concept is considered fine :)

  73. nephele

    https://github.com/tulir/matrix-doc/blob/formatting-entities/proposals/2427-json-based-message-formatting.md

  74. Link Mauve

    nephele, XEP-0071 is that but for XMPP.

  75. moparisthebest

    > Matrix formatting is currently based on a subset of HTML. Sounds like most clients are probably vulnerable to what most xhtml-im clients are vulnerable to

  76. nephele

    Link mauve: no, that is differnt

  77. Zash

    nope, because json protects it!

  78. nephele

    moparisthebest: yes... which is why i made this alternative formt :)

  79. Zash

    so something between xhtml-im and https://xmpp.org/extensions/xep-0394.html

  80. nephele

    Eh, not that similar either

  81. Zash

    actually, closer to xhtml-im

  82. Zash

    modelled in json

  83. nephele

    It's not html :) that was the main point anyhow

  84. pep.

    xhtml-im isn't "html" either

  85. pep.

    It's a strict subset of xhtml

  86. nephele

    Yes, but you cannot use an html paraer for this one

  87. Zash

    It can be translated to HTML, therefore vulnerable.

  88. MattJ

    Well, anything can be vulnerable

  89. pulkomandy

    unicode seems more dangerous than html :>

  90. Sam

    I would assume this would be less likely to be vulnerable?

  91. Zash

    The vulnerability is the web itself, not the format!

  92. moparisthebest

    In practice all clients just drop it into a browser that supports JavaScript etc

  93. MattJ

    I think a custom format that can't be passed to a renderer (i.e. not HTML, XHTML or Markdown) is less likely to cause implementation vulnerabilities

  94. moparisthebest

    I get that it "can be implemented securely"

  95. pulkomandy

    that's clearly not the case for the code I'm writing

  96. pep.

    moparisthebest, web* clients

  97. pulkomandy

    so "all clients" can't be true :)

  98. moparisthebest

    No, all

  99. pep.

    Also not poezio

  100. Sam

    but yah, adding the massive web footprint and platform is the real problem

  101. moparisthebest

    If you have a spec that all clients implement in an insecure way, it's a bad spec even if it can be secure in theory

  102. pulkomandy

    you don't need a web engine for this. I used libcss to parse the css and give me easy to use styling attributes. No HTML parser or DOM or anything crazy like that involved

  103. Zash

    good luck finding a decent rendering engine that doesn't come with a javascript engine bolted on

  104. pulkomandy

    you don't need a full rendering engine for this, that's why it's a subset of xhtml and not the full thing

  105. MattJ

    Ideal would be a new "safe" format, with reference implementations in multiple languages for translation to HTML and other common markup formats

  106. Zash

    there aren't a lot of rendering engine implementations afaik

  107. Link Mauve

    Yup, poezio’s rendering engine is decent and is written in about 500 lines of Python.

  108. Zash

    terminal is easier but turning text into pixels is hard

  109. Link Mauve

    For most toolkits this is a solved problem though.

  110. Zash

    ... because they include HTML+CSS+JS based rendering engines

  111. Link Mauve

    Although with resolutions being bigger and bigger, the traditional way is starting to be a bit limited, so newer ways to turn text into pixels (using GPUs this time) are being explored.

  112. Link Mauve

    Zash, I’m most familiar with GTK, which only includes CSS out of these three, and pango implements a subset of HTML for its markup.

  113. Link Mauve

    There is no web engine nor JS available in there, without external libraries like webkit2gtk.

  114. Link Mauve

    Pidgin went for the latter, and this has been a massive drag since then.

  115. Zash

    Wasn't half of Gnome written in JS these days?

  116. pulkomandy

    here is my 800 lines of code to implement xhtml-im with just libcss and no javascript or html or dom involved: https://github.com/pulkomandy/Renga/blob/master/ui/Xhtml.cpp most of it is callbacks to tell libcss "no we don't need that here"

  117. Link Mauve

    Zash, gnome-shell is written in JS, but that’s the host language, not a language you are forced to embed just because.

  118. nephele

    Anyhow, if there is interest let me know and I'd work on a new format for xmmp

  119. Zash

    And back when Swift was more actively developed it was said that there weren't any rendering engines available besides webkit

  120. moparisthebest

    How many vulns is in libcss?

  121. pep.

    nephele, I honestly recommend fastening up your seatbelt really tight if you go that way in the XMPP world. Haters are gonna hate

  122. Zash

    We have how many formats already?

  123. pep.

    2 in use

  124. Zash

    we have enough war without another format war

  125. pep.

    Well they're not even the same thing, that's the worst. One is a wire format missing an input format, the other is an input format missing a wire format

  126. pep.

    Together they could go very far but for some reason one doesn't like the other. I'll let you guess which

  127. Link Mauve

    pep., probably just nobody did it so far.

  128. pep.

    Link Mauve, well the latter mandates input format == wire format, so it's not really possible. That's the trick :p

  129. Link Mauve

    Although I’d rather go for something a bit more widespread, such as Markdown, for such an input format.

  130. Link Mauve

    pep., not really no, does it?

  131. pep.

    Isn't that the whole point

  132. pep.

    of 393

  133. Link Mauve

    pep., it has some examples of it being used in {jabber:client}body, but that’s just examples, not standard text.

  134. Link Mauve

    You can perfectly well use that as your input format, and transform it before sending it to the recipients.

  135. pep.

    Link Mauve, I know, see https://lab.louiz.org/poezio/poezio/-/issues/3455#note_7769

  136. Link Mauve

    Right.

  137. pulkomandy

    I'd rather go with https://xmpp.org/extensions/xep-0394.html than 393 if we really have to remove xhtml-im (but again, no support for colors there, yet?)

  138. pep.

    Reading 393, I just discovered: « Clients that do not support this specification MUST still be able to receive messages sent by clients using this specification and display them in a human-readable form. »

  139. pep.

    Is that really a thing? a MUST for non-supporting implementations?

  140. Link Mauve

    pep., it’s mu. :D

  141. Link Mauve

    A specification can’t force non-implementers to do anything.

  142. Sam

    Good catch; that's just a requirement, that "MUST" should be "must".

  143. Sam

    Oh, no, nevermind

  144. Sam

    But still, it's not a requirement on the clients to do anything, it's a requirement on the spec to do something

  145. pep.

    Ok

  146. moparisthebest

    Markdown also requires a browser which in practice always comes with JavaScript

  147. Sam

    It doesn't require a browser, but in a browser all the markdown libraries I looked at appeared to be vulnerable by default to injecting scripts or something executable which is part of the reason I didn't just go with that when writing 0393.

  148. pulkomandy

    yes I'm a lot more worried about me trying to write a parser for 0393 than about using libcss for 0071

  149. moparisthebest

    I wouldn't be

  150. Link Mauve

    moparisthebest, Markdown is a superset of HTML, it doesn’t “require a browser” nor JavaScript.

  151. moparisthebest

    Link Mauve: in practice it'll always be implemented that way

  152. Link Mauve

    moparisthebest, not really no.

  153. Sam

    I would be interested to see a spec that used XML for formatting similar to XHTML-IM but w/o the HTML part and w/o the "tries to link into the plain text body too" part of 0394. I dunno if it would be better or worse, and you end up with the "plaintext/formatted message bodies are entirely different problem", but I'd like to see it and would be curious what could be done with it.

  154. moparisthebest

    Again, I don't care what's theoretically possible, only what happens 99.9% of the time

  155. Sam

    moparisthebest: I don't think that's true, none of the markdown parsers I've ever used required HTML (unless they were javascript ones). I mean, you're right about the problem, just wrong about that detail I think

  156. pulkomandy

    well, xhtml-im but we do a rot13 on all the xhtml element names to make sure they are not accidentally sent to an html parser?

  157. moparisthebest

    See also: _xmppconnect and XMPP XML being a "strict subset of XML" where all projects just use an XML parser and are vulnerable

  158. Link Mauve

    Sam, that would be exactly the same as XHTML-IM imo, clueless webdevs will just make it go through some XSLT or whatever and end up with the exact same vulnerabilities, while you have fragmented the ecosystem with one more wire format.

  159. Sam

    I'm not 100% sure that's true, but you might be right

  160. pulkomandy

    clueless webdevs don't know about XSLT, they would implement something similar, but slower in javascript

  161. Link Mauve

    Right.

  162. Link Mauve

    Sam, clueless webdevs have vulnerabilities in anything where plain text is used in the protocol, built-in the browser under the name innerHTML.

  163. Link Mauve

    Once the JS converter to HTML has been passed, they’ll put it in the DOM with innerHTML and get the same vulnerability they’ve used for years.

  164. Sam

    Yah, actually, you're probably right. The naive case would carry over the attributes and one of those will be javascript:onmouseover or whatever.

  165. Link Mauve

    Exactly.

  166. moparisthebest

    You could say the same about clueless C++ devs who think "I'm sure I can write secure c++ *this* time"

  167. pep.

    Maybe someday we'll stop betting that clueless webdevs be clueless and limit our specs and we'll start helping/training them instead and write our specs with less worries

  168. Link Mauve

    Ha, I’m not gonna train a webdev.

  169. pep.

    :D

  170. Link Mauve

    I’m bad at webdev myself.

  171. Link Mauve

    Stuck about ten years ago.

  172. moparisthebest

    You need to write specs that can be implemented securely by anyone that can read them without knowing a ton of non obvious stuff

  173. pep.

    The point is, if you think people are dumb you're not gonna go very far

  174. Link Mauve

    pep., their very platform is offering them footguns.

  175. pep.

    Then let's change the platform

  176. Link Mauve

    moparisthebest, good luck with that.

  177. moparisthebest

    Link Mauve: *different footguns

  178. Link Mauve

    That would be a platform where exactly no wire text is present in the final UI.

  179. Link Mauve

    For a chat system for instance, you wouldn’t go very far.

  180. Sam

    It's not that we're just assuming web devs aren't intelligent, it's that literally every web client I ever tried that supported XHTML-IM (and I don't think "every" is me being hyperbolic) had trivial vulnerabilities. Sure, I reached out and helped fix a lot of them, but the point is that experience shows us that we handed them a gun pointed at their foot and then just told them "but be careful and don't pull the trigger"

  181. moparisthebest

    Have you ever used openssl?

  182. pulkomandy

    sadly, yes :(

  183. pep.

    Sam, you're mistaken on the footgun though

  184. pep.

    There is one in that story for sure

  185. moparisthebest

    All computer stuff is a dumpster fire, pointing out that different trash is burning on the webdev side vs native code doesn't feel helpful

  186. Link Mauve

    Sam, our specification might not have carried enough big blinking red warnings, but I’ve found similar vulnerabilities in multiple clients’ handling of MUC nicks, the thing in the resource. :D

  187. pulkomandy

    also we have specifically said "clueless webdevs" which is a subset of webdevelopers. There are skilled ones too, and there are clueless C++ developers too

  188. Link Mauve

    It’s explicitly specified as an opaque string.

  189. pep.

    pulkomandy, agreed

  190. Sam

    Sure, there are also other vulnerabilities and common problems; that doesn't mean we shouldn't fix the ones that can be fixed.

  191. Link Mauve

    Removing the ability to send formatted text was never a fix, even less a good one.

  192. Sam

    No one removed the ability, we obsoleted the spec which means "the XSF doesn't recommend this particular spec".

  193. Link Mauve

    But we’ve had pages of emails on that topic, let’s not go over them again. :)

  194. Link Mauve

    Sam, right.

  195. pep.

    Yeah, pages of feedback on that topic which got ignored

  196. pulkomandy

    well it seems the result is client devs like me thinking "the XSF is stupid, they don't provide any alternative so I'm going to implement this anyway"

  197. Link Mauve

    pep., not really, I mean people continue to implement it despite it being obsolete.

  198. Sam

    It was all discussed multiple times. Just because your way didn't get picked doesn't mean you were ignored.

  199. Link Mauve

    pulkomandy, that’s approximately my stance on that too.

  200. Sam

    The XSF isn't some magical body telling you what to do; the council just said "we don't recommend this one because experience has shown us it's difficult to do right". The XSF is *you*, other alternatives could be proposed (like 0393 and 0394). If one of them got implemented and the other didn't, it's the community that voted with their code, not the XSF. And you could always propose another that includes whatever formatting you think is missing

  201. Link Mauve

    Sam, no need for that, 0071 works.

  202. Link Mauve

    At best what I’d propose would be some bright blinking red warnings about our implementation experience.

  203. pulkomandy

    yes, what do we do, resubmit 0071 with a new xep number and rename it "totally-not-xhtml-im" ?

  204. Sam

    Well, that's fine, but the council at the time disagreed.

  205. Sam

    In theory the council is experienced people who know a bit about XMPP. That's not to say that every decision will be perfect, and not to say that you can't ignore their warning and go implement it, just that it might be worth considering why they did it and that it wasn't because they ignored you.

  206. qy

    I like 0394 more but 0393 seems more usable, probably best implement them both

  207. pulkomandy

    well, there is this spec being used in the wild by at least 4 different xmpp things, there is no replacement (393 and 394 don't implement the two features I need: marking up links so I don't have to guess, and converting IRC styling so that IRC users can smoothly migrate to my client and not lose any features) and I'm not going to spend time writing more specs because I have enough work to do writing code supporting existing stuff. Do whateveryou want with that information :)

  208. Sam

    (FWIW, I think we need a linking spec in particular and would love to see that exist, I've thought about working on one a few times)

  209. qy

    i feel like oob is fine for linking, just that it has been implemented in such a wacky way

  210. Sam

    Maybe I should finish my LaTeX-IM spec. It was meant to be published on April 1st last year, but I never got around to finishing/submitting it.

  211. Link Mauve

    :D

  212. Link Mauve

    Reminds me of a Gajim plugin I once wrote, which would render Lilypond markup inline. <3

  213. Link Mauve

    (0393-style)

  214. pep.

    Do I need to download a texlive distribution for the LaTeX-IM spec? :P

  215. Sam

    oooh, I would legit use that, not even as a joke. I used to write a lot of music and I *love* lilypond (even if every release breaks my old stuff and it's really confusing markup for anything more advanced than a simple staff)

  216. Link Mauve

    (Where a client which didn’t support this markup would still show you the { \treble \time 4/4 c8 d e f g2 }, while a client with support would render a lovely score.

  217. Link Mauve

    The main issue with that is that Scheme support means you basically own the remote computer.

  218. Sam

    I left the note on codeblocks undefined in 0393, but I keep hoping clients that implement it will do things like that, eg. gajim might let plugins hook into ```note and if it sees ```lilypond it could try to render it, etc.

  219. Sam

    But yah, that opens a whole other can of worms.

  220. Link Mauve

    Preformatted text (<pre><code/></pre> in HTML) is by no means made to actually render or run the thing.

  221. Link Mauve

    Although you could add a Run button in your client, so that for instance a Python snippet can be executed inline.

  222. Link Mauve

    Hopefully, only with proper sandboxing in place.

  223. pulkomandy

    a good way to check if there are also clueless python devs :')

  224. Link Mauve

    Are you willing to bet on most clients doing security properly? :)

  225. qy

    > so that for instance a Python snippet can be executed inline.

  226. qy

    😱️

  227. moparisthebest

    > Hopefully, only with proper sandboxing in place. You just described all of the web

  228. Link Mauve

    The web is actually a very good sandbox. :)

  229. pulkomandy

    but that can't protect a website against itself :)

  230. Link Mauve

    Actually there are quite a few mechanisms for that, iframe for one, combined with HTTP headers.

  231. moparisthebest

    CSP?

  232. Link Mauve

    Yeah.

  233. Link Mauve

    And a few other ones.

  234. moparisthebest

    It's just piles upon piles of hacks to try to make it secure

  235. moparisthebest

    Well, all of computing is

  236. qy

    some parts more than others though