jdev - 2023-12-25


  1. manday

    Is there anything about mathematical markup in XMPP? Since XEP-71 was deprecated, there doesn't seem to be any way to transmit formulae as the XEP-393 has nothing about math. I'm looking into deploying XMPP in an educational orga and being able to write formulae would be important. Currently, only Matrix (Element Web) seems to have that.

  2. Guus

    It is not my area of expertise, but I'm not aware of any. If you can't express this sufficiently in UTF-8 alone and there are no other solutions, then this might be a good candidate for a new XEP.

  3. manday

    No, there is no way Unicode could express this sufficiently. The only other solution I can think of is rendering the formula into an image ( https://modules.prosody.im/mod_latex.html ) but that doesn't seem to be a viable method, really.

  4. Guus

    I've now added disco/info to https://xmppnetwork.goodbytes.im - interestingly, we had a submission of a domain that _had_ a pubsub node, but not the opt-in bit set. I guess a non-administrative end-user was playing. :)

  5. wgreenhouse

    manday: xep-0071 still has a few actively maintained implementations but I haven't tried math markup in any of them

  6. singpolyma

    manday: is there a way with 71? Xhtml doesn't have math does it?

  7. singpolyma

    I guess you could embed another namespace into the xhtml, but that would be beyond 71

  8. wgreenhouse

    yeah, 71 only supports a very limited subset

  9. singpolyma

    Well, even ignoring that xhtml namespace does not contain any math markep

  10. moparisthebest

    manday: how does matrix support math markup

  11. Zash

    by using html as wireformat?

  12. moparisthebest

    Just like, fully unrestricted HTML? Sounds like a security nightmare

  13. Zash

    This is what *everything* other than XMPP does these days, and somehow they get away with it.

  14. Zash

    (Read: Mastodon does it too.)

  15. MattJ

    Another fun fact: at *least* one of the implementations that did not sanitize XHTML-IM and triggered the anti-HTML movement, did not sanitize MUC nicks either. And we didn't deprecate those 🙂

  16. Zash

    moparisthebest, no it's perfectly safe for them because they embed the HTML in JSON!

  17. Zash

    (also probably some html-sanitizer.js lib that ships with matrix-js-sdk)

  18. MattJ

    XHTML-IM is a little harder to sanitize, but not terribly hard. The not important thing is that devs understand and do the sanitization, not what the wire protocol is

  19. MattJ

    I would personally lean towards some (X)HTML subset and none of the CSS stuff of the current XEP, which makes the sanitization harder (even though there are libraries available)

  20. singpolyma

    > This is what *everything* other than XMPP does these days, and somehow they get away with it. Yes, its not a problem at a wire format level it's perfectly sensible

  21. MattJ

    Filtering elements against an allowed list is not that hard

  22. moparisthebest

    I feel like there are 2 distinct schools of software development here, one is be totally safe all the time, and the other is just pretend sandboxes work and patch 80 vulns in them weekly (the browser model)

  23. singpolyma

    HTML doesn't really have any possible security issues though. CSS can in some cases, and obviously JavaScript does but HTML does not imply either of those

  24. singpolyma

    However the topic was math markup and none of html, css, or js solve that problem

  25. Zash

    HTML has math markup now tho

  26. singpolyma

    It does not

  27. singpolyma

    Browsers support MathML

  28. singpolyma

    But it's not part of html

  29. singpolyma

    It has its own namespace

  30. Zash

    But Browsers don't HTML with namespaces, they do HTML with mathml and svg built-in

  31. Zash

    So for those HTML-in-JSON things, it's just more HTML.

  32. singpolyma

    They do tag soup and infer the namespaces, sure. But that doesn't make MathML part of HTML anywhere that says "html here"

  33. moparisthebest

    singpolyma: you are pretending like "HTML" means "HTML" when it really means "whatever works dumped into a browser"

  34. singpolyma

    Browsers support all kinds of stuff that aren't HTML. Basically always have

  35. singpolyma

    Growing all the time

  36. Zash

    I know clients at least do some sanitizing, <script> etc doesn't come out the other end.

  37. theTedd

    > HTML doesn't really have any possible security issues though. CSS can in some cases, and obviously JavaScript does but HTML does not imply either of those If you only support pure HTML without CSS or JS, you're going to have a hard to time with the modern web

  38. theTedd

    MathML is part of HTML 5

  39. moparisthebest

    > Browsers support all kinds of stuff that aren't HTML. Basically always have Right and that's what these things support I suspect

  40. singpolyma

    I'm obviously not against people embedding whatever namespaces they wantkwherever they want, but if a client says "I support html" you can't expect it will definitely support your other embeds

  41. singpolyma

    I'm obviously not against people embedding whatever namespaces they want wherever they want, but if a client says "I support html" you can't expect it will definitely support your other embeds

  42. moparisthebest

    Trying to remove unsafe things is the patching weekly I mentioned before, only allowing specific things is the safe version

  43. singpolyma

    I embed svg in data forms, but that doesn't mean I expect clients who aren't me to use it heh

  44. theTedd

    manday, your best bet is either MathML or LaTeX math mode; the preference may depend on whether users are already familiar with the latter

  45. singpolyma

    Can convert between the two I think, so MathML is probably a better fit for an xml wire format

  46. singpolyma

    > Trying to remove unsafe things is the patching weekly I mentioned before, only allowing specific things is the safe version Just don't implement support for any unsafe things and you should be good 😉

  47. Zash

    Do any clients do something like rendering $$\LaTeX$$ ?

  48. singpolyma

    But yes, even browser stuff all operates based on whitelist obviously

  49. Zash

    Gajim seems to have LaTeX plugin at least

  50. theTedd

    > Just don't implement support for any unsafe things Singpolyma single-handedly solves all security issues

  51. moparisthebest

    >> Trying to remove unsafe things is the patching weekly I mentioned before, only allowing specific things is the safe version > Just don't implement support for any unsafe things and you should be good 😉 They dump it into a webview which changes and introduces new unsafe things weekly, you aren't keeping up to date with all of them weekly

  52. singpolyma

    You can take this as far as you like. I hear "buffers" and "strings" are unsafe in some languages so maybe don't implement support for those

  53. Zash

    Computers were a mistake!

  54. theTedd

    Let's get back to basic and do binary computation using small pebbles

  55. manday

    > manday: how does matrix support math markup Clientside non-standard rendering convention: "formatted_body": "<span data-mx-maths=\"x + 3 = 5\"><code>x + 3 = 5</code></span>"

  56. manday

    Oh well that came out interpreted... What I meant to quote is: ``"formatted_body": "<span data-mx-maths=\"x + 3 = 5\"><code>x + 3 = 5</code></span>"``

  57. singpolyma

    So they just made something up

  58. Zash

    Heh, I don't find a lot of matrix.org things searching for "math" and "matrix" :D

  59. manday

    yes it's basically an Element (their client) only feature

  60. singpolyma

    manday: do you want math inline with other text or whole message is an equation?

  61. manday

    You have to enable it in "developer settings", too, for it to submit $....$ as math

  62. manday

    From first principles, I'd say inline; being able to type formulae correctly seems to be a natural part of text-based communication. In my opinion it's even more fundamental than *Styling* (i.e. XEP-393)!

  63. singpolyma

    Ok. So MathML namespaced into xhtml-im body is probably the way. I'll look into how hard it would be to interpret that

  64. jonas’

    singpolyma: that would indeed be the way

  65. Guus

    Should we have some xep that allows for content with a reference to the markup to be applied, rather than having a different XEP per markup language?

  66. manday

    I agree, with XMPP being based on XML MathML seems like a likely candidate

  67. singpolyma

    Sure, but MathML alone won't give you inline which is why I suggested what I did

  68. manday

    Downside being, it wouldn't only require parsing for rendering, but also parsing for submission, since users will more likely want to type in AsciiMath or Latex/itex

  69. singpolyma

    Anything will need that yeah

  70. singpolyma

    Other option of course is: ```latex Some cool math ``` This is the styling way heh

  71. manday

    singpolyma wouldn't an XEP in the spirit of 393 be appropriate? I.e. just specify that $....$ is to be interpreted as latex markup and rendered by clients as they see fit

  72. manday

    > Other option of course is: > > ```latex > Some cool math > ``` > > This is the styling way heh Yeah why not!

  73. singpolyma

    Styling already says the language tag after ``` is allowed

  74. manday

    If that includes inline spans as well

  75. moparisthebest

    Could just send around PDFs, ie, the email way

  76. Zash

    Putting more syntax into the plain text body? :|

  77. singpolyma

    No, you can't specify language on inline only on block that's true

    😔 1
  78. manday

    Zash how is it still a plain text body after 393?

  79. Zash

    :(

  80. singpolyma

    You guys we can go full org mode and put a play button in clients with ```javascript blocks to run the code. Who needs xhtml-im for full rce 😉

  81. manday

    So that I understand correctly, you'd say that if the sender types something like > Look at this math: $a^2 = b^2 + c^2$ into their client, that will be transmitted as what exactly?

  82. singpolyma

    I'm suggesting xhtml-im with namespace embedded MathML for the equation part. That's my actual suggestion

  83. singpolyma

    I've put it on my list to attempt an implementation to see if it's very hard or not

  84. manday

    but xep-71 is deprecated?

  85. singpolyma

    So?

  86. Zash

    A word on a page can discourage, but not stop anyone from implementing anything.

  87. theTedd

    Especially if that word is _thisistemporaryandwillbereplacedinthefuture_

  88. singpolyma

    And can't find out if anything is a good or bad idea until someone writes code

  89. Zash

    manday, we have a dream that XEP-0071 will return in the future, with a better security warning

  90. singpolyma

    Zash: some of us do. You and I at least it seems

  91. theTedd

    Unfortunately, security warnings are far from effective

  92. singpolyma

    But honestly deprecated is more legit to implement than experimental or deferred and people ship those all the time

  93. Zash

    Not even 393 is safe! Someone might shove it into a Markdown library with HTML passtrough enabled and then everyone gets hacked!

  94. singpolyma

    Zash: or interpret ```js blocks!

  95. Zash

    or interpret the scripts in tetris.svg !

  96. singpolyma

    Or store the markup in a buffer. I hear those overflow

  97. theTedd

    There is always the possibility of someone doing something stupid, but there are also easy traps to fall into and maybe we shouldn't guide people down that path

  98. singpolyma

    Never underestimate the power of a junior with bad tools and inadequate code review

  99. Zash

    Meanwhile, everyone else does even stupider things and somehow gets all the popularity!

  100. theTedd

    1. Do stupid things; 2. Gain popularity.

  101. wgreenhouse

    > You guys we can go full org mode and put a play button in clients with ```javascript blocks to run the code. Who needs xhtml-im for full rce 😉 /me resembles this remark

  102. Zash

    0. Put the whole budget into marketing instead of hiring competent engineers.

  103. singpolyma

    Zash: to be fair we didn't reverse that, we just have no budget

  104. manday

    I'm just a bit concerned by the fact that we're realizing a fundamental/basic text mode (formulae) through an overkill feature set (HTML)

  105. manday

    In my mind, there are 3 tiers: 1. Text and Formulae 2. Basic semantic markup (bold, italic, hyperlinks, section headers, tables...) 3. Full fledged HTML (everything beyond)

  106. manday

    Realizing a part of 1. with 3. may not sit well with all use cases

  107. Zash

    I have to admit I have never thought of mathematical formulae as fundamental.

  108. wgreenhouse

    manday: for a pedagogical/academic usecase, the thing you probably want is TeX formulae, whoch are...not small to support

  109. wgreenhouse

    what does texlive weigh these days?

  110. manday

    Math markup (even with latex syntax) is not the same thing as (la)tex. There are dozens of lightweight implementations which turn latex-ish markup into math of one format or another.

  111. manday

    I use itex2mml daily, it's a small binary or ruby gem

  112. manday

    (converts $...$ to MathML)

  113. manday

    > I have to admit I have never thought of mathematical formulae as fundamental. As a mathematician I may be a bit biased but I understand it as yet another language (and a universal and fundamental one) with a complexity which Unicode can't handle (like it does with CJK etc.)

  114. wgreenhouse

    yeah for a jabber client tex for "export" into MathML either via 393 pr via some unwritten extension to 0071 sounds like the way

  115. wgreenhouse

    *or

  116. wgreenhouse

    the formula as typed would be in the plain text body as fallback

  117. singpolyma

    > I'm just a bit concerned by the fact that we're realizing a fundamental/basic text mode (formulae) through an overkill feature set (HTML) No. XHTML is the proposal to inline other things into text. The math itself would just be MathML no html in that part afaik

  118. singpolyma

    Same reason i use xhtml-im for inline images (aka custom emoji)

  119. theTedd

    That formulae are a fundamental part of text may make sense for mathematicians, but I'm not sure the rest of the world feels the same way about expressing formulae in everyday communication.

  120. theTedd

    I don't expect adding math rendering to 0393 would go well - it's not a simple thing to render properly - even if you could excuse the use of 'math' as a code dialect

  121. ManDay

    singpolyma 71 seems rather monolithic and definite in that regard. They propose exactly what subset of HTML is supposed to be suppored by 71-compliant clients. I would rather opt for a more fine-grained XEP which defines an extensible syntax to inline semantic markup into the body. Such an XEP "Z" would, in essence, establish a syntax like XML for the body and *other* XEPs would then define elements w.r.t. "Z", such as "<math>...</math>" (for MathML) or "<latex>...</latex>" (for tex). 393 could fit into that larger scaffold by defining "<b>...</b>", etc.

  122. singpolyma

    Oh yeah, I have no interest in the "subset" part of the xep. It's well intentioned, but trying to exert control of someone else's namespace this way feels misguided

  123. ManDay

    Well if supporting 71 means supporting all of HTML then I rather would not have Math support as part of 71

  124. singpolyma

    It wouldn't be part of 71 but it could be an extension on top of 71

  125. ManDay

    That still sounds like support Math required 71 and therefore supporting HTML as a whole.

  126. singpolyma

    In the simple math-only case there would actually be no xhtml at all except a body tag. Which is a bit silly but allows combining with other content as needed for inlining

  127. singpolyma

    Of course we don't know if this will be complex or not. Certainly it will require me to fork my html renderer but I kinda wanted to do that anyway to improve whitespace handling

  128. ManDay

    But how will you classify that in terms of client support? A client which supports math, but not the full HTML, which XEP does it advertise?

  129. singpolyma

    Advertise in which way?

  130. ManDay

    Saying that it supports them

  131. singpolyma

    To humans? I think most humans should never hear the word "XEP" and the few that do understand the nuance of partial implementation (which is already common for many xep)

  132. ManDay

    Okay, but speaking of "partial" implementation could become unncessary and facilitate qualificaion, if 71 was indeed replaced by something less monolithic where individual tags can "plug" in on a per-XEP basis.

  133. ManDay

    Also there was probably a reason 71 was deprecated (and I don't know why or how); I suppose a more lightweight, flexible approach may be favoured by the council and gain support more easily, don't you think?

  134. singpolyma

    I can't speak for the rest of council, only myself. But I can tell you were are divided on this issue and rumour has it so have most councils been 🙂

  135. singpolyma

    I can't speak for the rest of council, only myself. But I can tell you we are divided on this issue and rumour has it so have most councils been 🙂

  136. ManDay

    Well do you see an advantage of the plugin-based approach over what 71 currently specs?

  137. ManDay

    Or rather, do you see a disadvantage?

  138. singpolyma

    I think "plugin based approach" is just called XML and 71 defines one plugin ("use xhtml") and I'm proposing you add a second ("and/or MathML")

  139. ManDay

    So that means supporting MathML (say in XEP-Y) would not at all require supporting XEP-71, not even partially?

  140. singpolyma

    I think it's sensible to use the same body tag wrapper to allow mixing them, but otherwise no

  141. ManDay

    All right, apologies for my misunderstanding that

  142. Zash

    I can imagine that it gets really tricky to handle fallback for clients that support '71 but not the math stuff.

  143. singpolyma

    I don't know what MathML has for alt, yeah. This is basically speculation until someone writes code.

  144. ManDay

    That's why I mentioned a possible <latex> tag, which holds the markup

  145. jonas’

    latex is turing completr

  146. jonas’

    latex is turing complete

  147. jonas’

    you don't want to expose that

  148. jonas’

    it is very much not made for untrusted inputs

  149. ManDay

    Whenever I mention "Latex" I'm only referring to the math markup AMSmath

  150. ManDay

    It just occured to me, but maybe it's a silly idea, that an even more reliable way could be turning math markup into an SVG on the sending side.

  151. ManDay

    Semantically, there wouldn't be a big loss, imo.

  152. Zash

    Almost full circle to rendering to a png? :)

  153. ManDay

    Not *as* bad, but yes

  154. jonas’

    BoB+XHTML-IM+MathML "alt" text?

  155. selurvedu

    Zash, yeah, should definitely render a png as a fallback for those who can't render svg :)

  156. ManDay

    I mean I get the argument that for the rest of the world math may not be as much of a 1st class citizen as ordinary text. I may imagine other "languages" (graphs, for example) which would be equally important to certain users. And seen from that point, it would be a question of remaining practical. Sending these kind of things as SVG rather than devising a standard to render them client side for *display* alone sounds just reasonable. It somewhat retains entropy and the no one would care about the lost semantics in reality, either.

  157. jonas’

    you can send images (like svg) today already though

  158. ManDay

    i know. it's just a thought, if the "proper" way turns out to be too complicated. I have yet to seen a FOSS app on android which typesets formulae (let alone mathml) well, for example. Web clients have it trivial, either with the MathML as-is or with some MathJax in between, but for other clients I imagine the implementation can either become cumbersome or bloat the code.

  159. Guus

    As I feel that further discussion on the XMPP Network Graph is becoming a bit off-topic-ish for this room / might start to annoy people that aren't interested in it, I've created a dedicated chat room for it. Please join me in xmpp:xmppnetworkgraph@conference.igniterealtime.org?join if you want to track future discussions around that app.