XSF Discussion - 2017-10-14


  1. Syndace

    Can a client respond with an internal-server-error if something internal goes wrong when handling a request? The "server" part makes me unsure.

  2. MattJ

    Heh

  3. MattJ

    Do you have a specific case in mind?

  4. MattJ

    or is this just a "is it possible"?

  5. Syndace

    Im currently coding a client for fun and I have a situation where something internal may fail and I'm wondering what to respond in that case. To make it more specific: Before I send a stanza as response I validate it using the XML schema files and some additional logic. What do I do if the validation fails? I have to anwser something to iq request..

  6. Syndace

    Now that I think about it, maybe I should only validate incoming stanzas and not outgoing ones. The receiving entity can't expect valid stanzas anyway and has to validate itself.

  7. Syndace

    I just thought it would be cool to make sure what I'm sending is valid.

  8. MattJ

    Isn't that bad-request?

  9. MattJ

    Oh, sorry, I see what you're saying

  10. MattJ

    Maybe undefined-condition with a custom error would be appropriate here

  11. MattJ

    There's not much the remote party can do about it in any case

  12. Syndace

    Hmm the undefined-condition should be used in application-specific cases

  13. Syndace

    I mean, should is not must but it does not feel clean either

  14. Syndace

    I think I'll use my internal validation to display warnings/errors and send the invalid stanza anyway. At least this way finding and debugging invalid stanzas is easy.

  15. Flow

    Syndace, whatever you do, it's often a good idea to include a human readable english text into the error response which provides more information about what whent wrong. But only if that information does not cause some sort of security leak

  16. Syndace

    Flow, most of the errors are self explanatory, aren't they? Often the XEPs define semantics for error conditions. I like to avoid decisions that might cause security leaks.

  17. Flow

    Syndace, In my experience it's quite the opposite. For example internal-server-error: It's often usefull to know the cause of the error

  18. Syndace

    Why does the client have to know which internal server error occured?

  19. Syndace

    Nevermind, there are probably cases where additional info makes a lot of sense. I'll overthink which of my generated errors could benifit from additional info.

  20. Flow

    Syndace, so that it can report the error condition back to the server operator

  21. Syndace

    The server operator should really log internal errors himself..

  22. Flow

    and he will very likely do that

  23. zinid

    Syndace: sometimes a user sees an error and don't bother support, because the error is temporary, for example, database failure

  24. pep.

    jonasw, in your last email to the XHTML-IM thread, I fail to see how having a protocol break from xml to xml would fix OP's issue. I'd like to keep XML as well but if you do that you're still open to the same vulnerabilities

  25. jonasw

    pep., yes, I’m not convinced that XML helps, which is what I wrote I think?

  26. pep.

    let me reread

  27. jonasw

    actually I’m even posting an example of how this might be exploited

  28. jonasw

    yeah, I probably should’ve added something like "I *think* that […], but I’m not convinced that clients will do the right thing by default, which is why we’re trying to get rid ofXHTML-IM in the first place." will clarify on-list

  29. pep.

    :)

  30. pep.

    But,

  31. pep.

    hmm, yes

  32. pep.

    yeah, saying "We are now using this NS instead" won'T fix the problem of people not validating

  33. pep.

    So if people want a change, XML is a no-no

  34. pep.

    And so is markdown because some implementation (most?) accept html

  35. jonasw

    markdown is out even if only because it’s not extensible

  36. pep.

    Right

  37. pep.

    But I would go further and say, for XML, it's a no-no for web clients at all.

  38. pep.

    Because people are putting stuff everywhere without validating and that creates vulnerabilities :)

  39. pep.

    Not just in <body>

  40. pep.

    Even if it's the most obvious

  41. goffi

    pep.: that is issue with client dev, validating untrusted input is the first thing to learn when you do web dev (even not web actually)

  42. goffi

    pep.: (disclaimer: I'm not claming that my web software are failproof, security issue can and will happen to most software)

  43. pep.

    goffi, yeah, look at my last email on the list

  44. Zash

    Can't have nice things!

  45. pep.

    I should have put something like "unlike OP suggests" at the end of that sentence

  46. goffi

    jonasw: thanks for your last message, I kind of feel alone when I don't even understand while people are still talking about markdown after the debate we had.

  47. goffi

    s/while/why/

  48. jonasw

    I’m assuming some good faith (i.e. too long other thread and people didn’t read)

  49. goffi

    it's possible, I've had the same thing during OMEMO flamewar, hard to follow when you arrive after the battle.

  50. Zash

    I kinda wanna pin the entire xhtml-im problem on whoever invented .innerHTML

  51. jonasw

    Zash, .innerHTML isn’t the issue with XHTML-IM

  52. jonasw

    XHTML-IM breaks not only if you use .innerHTML, it also breaks if you use appendChild

  53. Zash

    Blame the web!

  54. edhelas

    Mardown in XMPP, seriously ?!

  55. edhelas

    i'm not against Markdown, but looks like we are trying to solve a problem by changing it

  56. Zash

    No, Markdown being defined as a superset of HTML rules it out

  57. jonasw

    dwd gave a proper definition of what he thinks should be markdown for IM.

  58. jonasw

    more or less proper.

  59. jonasw

    I’m tired of asking how to emphasize "Trainer*Innen" with that though.

  60. Zash

    \bold{Trainer*Innen}

  61. jonasw

    uhhh

  62. jonasw

    TeX in IM

  63. goffi

    one of the problem is that people always thing about their own use case

  64. jonasw

    that’s execellent

  65. jonasw

    it’s even turing complete!

  66. Zash

    (I don't actually know TeX, wild guess)

  67. jonasw

    it’d be \emph{} or \textbf{}, depending on whether you want emphasis (italics) or actual boldface

  68. Zash

    Is this Creole? http://www.wikicreole.org/wiki/Creole1.0

  69. goffi

    XMPP message can also be "normal" with a subject and potentially long, not necessarily a line in a MUC/MIX, and embedded images can be useful there (think about email gateway)

  70. jonasw

    goffi, agreed

  71. jonasw

    Zash, yes

  72. goffi

    I'll take a break on standard@ for tonight, I have written enough there today :)

  73. Zash

    Ok, me, a naive web developer does a simple string replace and puts the result as .innerHTML

  74. waqas

    You are doing a simple string replace? That's naive web developer lvl3

  75. edhelas

    I append my HTML like I append my variables in my SQL requests, using concatenation

  76. goffi

    why we don't use mirc colors in <body>? Looks like a good idea (as good as using markdown)

  77. jonasw

    goffi, those use control codes in the 0x00..0x1f range right? you can’t send those over XML

  78. goffi

    jonasw: easy, we just have to add \uxxxx

  79. goffi

    we can even use ANSI escape code like this, will be great

  80. dwd

    goffi, Note that I'm explicitly suggesting we *keep* XHTML embedding, but just avoid it being the go-to way of adding bold and italics in IM messages.

  81. jonasw

    dwd, I’m not convinced that this is a good thing.

  82. jonasw

    this just introduces fragmentation we can avoid.

  83. Zash

    Is it bold and italics and other *styling* people want?

  84. dwd

    jonasw, I don't really want to specify a whole new document definition.

  85. Zash

    If so, bringing back <font> could work

  86. jonasw

    dwd, why not?

  87. dwd

    Zash, For IM, you mean? I think people want emphasis (mostly just *bold*) and preformat is handy, mostly because code-like stuff is about the only time you use * and / in IMs.

  88. Zash

    XFONT-IM, you get <font> with a bunch of attributes that map roughtyl to CSS properties

  89. jonasw

    dwd, as I said. Trainer*Innen is a legit german word

  90. goffi

    dwd: for <message> ? I didn't got that, I though you were suggesting the separate XHTML XEP (which is a good idea) only for blogging

  91. Zash

    ~$ pandoc <<< '*Trainer*Innen*' <p><em>Trainer</em>Innen*</p>

  92. Zash

    Palm -> face

  93. goffi

    but sorry, movie time, will read updates later

  94. Zash

    jonasw: You type ^B Trainer*Innen ^B in your client. The client translates the input into protocol and sends it.

  95. Zash

    Or click the [B] button

  96. jonasw

    Zash, sure, but if the protocol doesn’t support it, because it’s mardown?

  97. Zash

    ~$ pandoc -f html -t markdown <<< '<b>Trainer*Innen</b>' **Trainer\*Innen**

  98. jonasw

    nice

  99. jonasw

    at this point we can simply use a proper format, because nobody will learn that syntax for themselves.

  100. dwd

    jonasw: Either (a) you can't. (b) We define bold toggle as being on a word break. (c) We use doubled asterisks as an escape.

  101. jonasw

    dwd, see above

  102. jonasw

    "you can’t" is a terrible answer

  103. Zash

    dwd: Should we really demand end-users learn some syntax?

  104. jonasw

    "bold toggle on word breaks" moves the "you can’t" answer to other cases

  105. jonasw

    "doubled asterisks as an escape", see what Zash says

  106. dwd

    No, it's not. You also cannot embed images. This is OK. There are lots of edge cases. Imagine how many there's going to be on a complete document markup language.

  107. Zash

    ~$ pandoc <<< '**Trainer*Innen**' <p>**Trainer*Innen**</p>

  108. Zash scratches head

  109. jonasw

    dwd, of course. which is why I suggest to have a language we can easily extend instead of some markdown-ish markup.

  110. jonasw

    that easily allows us to take care of edge-cases and new use-cases later

  111. jonasw

    without breaking everything again

  112. uc

    Don't forget S̶t̶r̶i̶k̶e̶ t̶h̶r̶o̶u̶g̶h̶

  113. zinid

    lol, guys, you will never come to agreement :)

  114. zinid is reading another round of xhtml-im ranting

  115. Zash

    I see. Then there can only ever be conflict between us.

  116. remko

    i shall refrain from suggesting to use docbook markup

  117. jonasw

    remko, how about groff?

  118. Zash

    I actually went to look through a bunch of existing XML formats yesterday

  119. Zash

    There's a bunch

  120. remko

    for the subset that i think we want (bold, italic, code), i don't think it matters really.

  121. jonasw

    remko, depends on who "we" includes, "bold, italic, code" doesn’t cut it.

  122. remko

    'we' => '99% of the use cases' ;-)

  123. Zash

    But I'm not sure any XML format will make it difficult enough to do the wrong thing

  124. jonasw

    Zash, I can see people not sanitising attributes and simply changing local names, unfortunately.

  125. Zash

    Yup

  126. remko

    jonasw: looking at all the IM clients out there, bold, italic, and underline are the only thing they seem to support. I haven't heard people complaining about this limitation.

  127. Zash

    Don't forget iChat users with colors

  128. remko

    it'll boil down to whether we want a structured format or a non-structured format. Personally, I'm torn. I used to lean one way, but am now leaning the other.

  129. jonasw

    remko, you haven’t heard me then ;-)

  130. jonasw

    remko, at least there’s also blockquote

  131. remko

    jonasw: I like blockquote. But there is a case to be made that this should be replaced by a snippet payload.

  132. Zash

    remko: semantics vs style, xml vs not xml, structured vs not

  133. Zash

    Any more considerations?

  134. remko

    Zash: Messages doesn't even support markup i just noticed.

  135. jonasw

    remko, and how’d you encode those in snippets?

  136. jonasw

    do we want full-blown HTML snippets, with all the vulnerabilities that has?

  137. remko

    nope. Just <pre>.

  138. jonasw

    do you confuse blockquote with code?

  139. remko

    i.e. a snippet is just rendered as a <pre>, no markup.

  140. jonasw

    https://d2k1ftgv7pobq7.cloudfront.net/meta/u/res/images/db8a72d486e14d6fe249b6a80962b69b/slack-webdesign-cropped.jpg

  141. remko

    jonasw: i did confuse both

  142. jonasw

    here’s one example of inline images, links, something like headings in an IM client

  143. Zash

    jonasw: Isn't that more like a referece?

  144. remko

    jonasw: for blockquote, the question is whether this is really a part of the message or not. Could be a reference to something that you happen to render this way.

  145. jonasw

    Zash, sure, it should be a reference in addition, and ideally the markup should reference the reference to make things super-clear

  146. jonasw

    but having every client support every type of reference isn’t a good idea I think

  147. Zash

    jonasw: {xep attaching} maybe?

  148. Bunneh

    jonasw: XEP-0367: Message Attaching (Standards Track, Deferred, 2017-09-11) See: https://xmpp.org/extensions/xep-0367.html

  149. jonasw

    wouldn’t it make more sense to have references in addition rather than instead of content?

  150. remko

    i don't think the bulk of 'modern' (non-XMPP) IM clients out there put images and links in their messages. They use autolinking and unicode replacement object, and attach the immage as an external object.

  151. edhelas

    the issue with XHTML-IM are also things like images

  152. jonasw

    remko, sure, you need to mark up where the image goes though

  153. Zash

    Do modern messagers even do actual in-line images?

  154. remko

    jonasw: hence the unicode replacement object

  155. jonasw

    which is again some kind of markup

  156. Zash

    As opposed to messages that consist only of an image

  157. remko

    Zash: i wonder about that. I have seen them use the unicode replacement object, but am not sure if they actually use it for placement.

  158. remko

    (talking about Messages concretely)

  159. remko

    if you look in the Messages DB, i see messages that came with an image as { "text": "\uFFFC Hi there", "attachments": [ { "image": "..."}]} (or some sorts)

  160. remko

    but as i said, i'm not sure if they actually use this, or just always render the image at the front/back of the message.

  161. jonasw

    remko, that looks like something which grew historically because they don’t have proper markup.

  162. remko

    might be

  163. remko

    in any case, i would rather not have any markup for images than <img> tags.

  164. jonasw

    what’s wrong wtih <img/> tags or their equivalent in any other markup?

  165. remko

    it's a slippery slope to too complex document. It's also ambiguous how text should flow around this stuff etc. If you don't allow it, it's less ambiguous

  166. remko

    just render is at a separate image. That's also how people feel IM should work i think.

  167. Zash

    remko: Like a separate type of message box?

  168. remko

    yes

  169. jonasw

    I think there’s a lot of value for that type of rich messages.

  170. Zash

    Yeah I think most things I've seen do that

  171. jonasw

    possibly not images, but other semantic markup

  172. remko

    jonasw: i'm not saying there's no use case in XMPP for rich messages with full document markup. IM is just not one IMO.

  173. jonasw

    talking about IM.

  174. jonasw

    not necessarily human-generated messages though

  175. remko

    those should perhaps be a different thing then.

  176. jonasw

    why make it a different thing?

  177. remko

    slack also distinguishes between bot messages and human-written messages.

  178. remko

    you can't do anything beyond bold, italic, and underline as a person.

  179. jonasw

    but why does the transport need to be different?

  180. jonasw

    sharing the transport format for whatever markup we’re doing leads to better interoperability

  181. Zash

    https://xmpp.org/extensions/inbox/content-types.html

  182. jonasw

    oh god

  183. jonasw

    type='text/xml'

  184. jonasw

    I’m in pain now.

  185. remko

    haven't read the XEP, but saw that pass the mailing list. That sounded like pandora's box to me :)

  186. jonasw

    that specific implementation feels bad

  187. jonasw

    and I’m still not convinced that it’s a good thing to have in any case. This will fragment implementations.

  188. Zash

    I think someone (could be me) suggested <body>markdown markup here</body><body-content type='markdown'/>

  189. zinid

    Zash: that's exactly Example 1 from the ProtoXEP

  190. zinid

    <message from='person1@example.org/34892374' to='person2@example.org/938089023' type='chat'> <body>**Note:** This message is very important.</body> <content type='text/markdown' xmlns='urn:xmpp:content'/> </message>

  191. Zash

    zinid: ah, must have scrolled past that

  192. jonasw

    zinid, yes, but an empty <content/> has a different meaning than a <content> with text content, which is super awkward.

  193. zinid

    jonasw: yep, I don't think we need this crap

  194. Zash

    I actually think it's awkward with <body/><xhtml-im/>

  195. zinid

    still, it's unclear what should a client render if it doesn't support the content type?

  196. zinid

    in the case of markdown it's obvious, but with another formats/

  197. zinid

    ?

  198. Zash

    zinid: if you do this only with formats that are still readable when treated as plain text, it's probably fine

  199. zinid

    Zash: yes

  200. Zash

    You probably also wanna have explicit features for formats you understand

  201. jonasw

    feature discovery doesn’t work in modern IM though.

  202. Zash

    As in, disco#info features and the whole caps dance

  203. Zash

    Oh right, we're moving away from the end-to-end thing :/

  204. jonasw

    let the server handle translation to the different markup types understood by the client!

  205. Zash

    Ohgod

  206. zinid

    jonasw: OMEMO guys will not approve :D

  207. jonasw

    zinid, right

  208. zinid

    feature discovery won't help in MUCs

  209. jonasw

    that, too

  210. jonasw

    or MIXes

  211. Zash

    or when you switch client and fetch stuff from MAM

  212. Zash

    or if you use carbons

  213. Zash

    or ...

  214. jonasw

    yes, so, that’s not really gonna work.

  215. Zash

    Can we even do things at all then?

  216. jonasw

    sure

  217. jonasw

    like we’ve been doing it with <xhtml-im/>

  218. Zash

    multipart/alternative basically

  219. jonasw

    yupp

  220. Zash

    Do everything at once and let the recipient do what they want :)

  221. zinid

    do I understand correctly: the only concern about xhtml-im is security issues?

  222. jonasw

    zinid, the "only", yes

  223. jonasw

    for me at least.

  224. remko

    no, i don't think that's true

  225. Zash

    That's why SamWhited wants it dead, no?

  226. remko

    that's what initially triggered the discussion, yes. And it's important, because xhtml-im is very hard to sanitize.

  227. Zash

    It's too easy to just stick the incoming XML tree into a browser DOM

  228. remko

    but i think other people don't want XHTML-IM, because it's very unpredictable to render in a chat log.

  229. zinid

    can we provide testing vectors?

  230. Zash

    User studies needed

  231. SamWhited

    I hadn't actually considered the difference between text style and layout before, it was just security for me at first, but now I agree that we need something that's purely style, not layout.

  232. jonasw

    SamWhited, I think we need something for semantics, not for style.

  233. jonasw

    (neither for layout by the way)

  234. Zash

    I actually think normal people will want style rather than semantics

  235. remko

    semantics for things that don't need layout :)

  236. jonasw

    emphasis, blockquote, strong emphasis, lists and enumerations, and code at the very least.

  237. remko

    yes, zash is right

  238. SamWhited

    jonasw: yes, that's fair, I think I agree with that. I can imagine certain clients render "emphasis" as italics and other bold or something similar.

  239. jonasw

    SamWhited, that, and also accessibility tools

  240. jonasw

    like screen readers

  241. jonasw

    they benefit a lot from semantics.

  242. remko

    i actually agree with Zash, people don't care about semantics in IM, they care about style.

  243. SamWhited

    Yes, I think for the most part you'll find they're the same for anything simple though.

  244. jonasw

    Zash, they think they want style, but they actually want semantics.

  245. remko

    they want something to be bold, not 'emphasized'

  246. Zash

    jonasw: That's probably true.

  247. jonasw

    remko, they want something to be bold to emphasize it

  248. jonasw

    they don’t think in terms of semantics, but that’s what they want

  249. remko

    different people have different interpretations of emphasis.

  250. remko

    if i want something emphasized, i don't want it in italic (even though that's the standard way to emphasize things)

  251. jonasw

    remko, you’re free to chose strong emphasis then, people make that kind of mistakes all the time.

  252. jonasw

    it’s still emphasis, and that’s the meaning which is wanted to be conveyed and which is conveyed

  253. SamWhited

    Although, if I'm a client author I'm going to put a "Bold" button, not an "Emphasis" button and it would be confusing if on one of my other clients the "Bold" button turns out to be italics.

  254. jonasw

    SamWhited, when you’re saying "purely style", I’m afraid that use-cases like enumerations are excluded though.

  255. remko

    SamWhited: exactly

  256. jonasw

    SamWhited, yes of course you wouldn’t label it emphasis ;-). and it should be made clear which defaults apply for the two kinds of emphasis people have (render weak -> italic, strong -> bold; people will choose strong in 99.9% of the cases, that doesn’t matter)

  257. SamWhited

    yah, nevermind, I juts changed my mind again. Conveying semantics might be nice in some cases, but all I want is the most dead simple thing we can do.

  258. jonasw

    will all due respect, I wonder whether you might maybe want to consider not only what you want ;-)

  259. remko

    jonasw: so you're saying that you're offering a bold and italic button, but insist they're using semantics? It has to render the same way on the other side, so i don't think it's semantics anymore.

  260. SamWhited

    I just gave you the reason why client developers want it.

  261. SamWhited

    (probably)

  262. jonasw

    SamWhited, as a client developer, I want to have one markup which covers all things my users are likely to encounter. This includes more complex messages (possibly generated by automated systems, similar to slack integrations or something). And I don’t want to have to support three different tiers of markdown dialects to achieve that.

  263. jonasw

    I may not offer my users the tools to create, e.g., a three-level nested enumerated list in the interface, because this ain’t a word processor.

  264. Zash

    People can't use a comma?

  265. SamWhited

    Wait, what? Who said anything about multiple tiers of markdown dialects?

  266. jonasw

    Zash, context?

  267. Zash

    jonasw: lists

  268. jonasw

    SamWhited, that’s what happens if we go down the route of "we’ll start with some simple text-based markup and oops, then we’ll find out two years later that we also need something to do lists or whatever, so let’s bump the namespace"

  269. jonasw

    Zash, bullet points, if you will

  270. zinid

    Zash: lists are easier to read I guess

  271. jonasw

    much easier

  272. Zash

    Do peopel use this when talking?

  273. zinid

    I do sometimes

  274. jonasw

    Zash, I do ;-). but also, "possibly generated by automated systems"

  275. SamWhited

    I think that: 1. That's probably not a problem 2. It already works fine

  276. zinid

    Zash: for example to tell my wife what to buy in a shop ;)

  277. Zash

    I have found a 𝐁𝐎𝐋𝐃 solution!

  278. jonasw

    it works fine until you have multiple lines in a bullet (e.g. due to word-wrap) and then it is unreadable, SamWhited,

  279. jonasw

    it works fine until you have multiple lines in a bullet (e.g. due to word-wrap) and then it is unreadable, SamWhited.

  280. jonasw

    I would like to quote a few things from the Zen of Python which have been in my mind during this whole "the new markup for XMPP" discussion: Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. […] In the face of ambiguity, refuse the temptation to guess.

  281. jonasw

    and also, especially since people are now suggesting that they’re creating precedents by implementing things in a wide-spread client: Now is better than never. Although never is often better than *right* now.

  282. Zash

    dwd: btw, *bold* in markdown isn't bold, but italics :P

  283. jonasw

    (that, too)

  284. jonasw

    SamWhited, I really, really don’t understand what the issue is with creating an extensible, very simple markup or re-using one instead of restricting us to a very small set of things.

  285. SamWhited

    I never said there was a problem with it

  286. jonasw

    fair.

  287. jonasw

    I somehow felt you did, but that wasn’t true.

  288. jonasw

    maybe I mixed up an email somewhere and had that lingering thought somewhere, I apologize.

  289. Zash

    SamWhited: Do you think any XML based format will be too easy to do the wrong thing with?

  290. SamWhited

    Zash: I'm not sure, I suspect so, but I have no idea.

  291. zinid

    Zash: only when you put unescaped cdata into DOM?

  292. SamWhited

    XMPP doesn't allow CDATA, no?

  293. Zash

    Question is, where's the cutoff where people will prefer the right thing?

  294. jonasw

    SamWhited, cdata is any text, actually

  295. zinid

    SamWhited: well I mean the content of <body/> for example

  296. SamWhited

    *unescaped CDATA

  297. Zash

    SamWhited: You are thinking of <![CDATA[ ]]>?

  298. Zash

    Whatever that's called

  299. SamWhited

    yah, that

  300. Zash

    Do we disallow it tho?

  301. zinid

    no, I meant a text within tags in general

  302. zinid

    I see no other way how to screw up

  303. jonasw

    zinid, I have one: let’s say we have a tag <emph/> for emphasis

  304. zinid

    we can assume that you can screw up during transforming layout element into DOM, but you can screw up this way with any formats

  305. jonasw

    a client may simply do a translation mapping the emph local name to em for XHTML.

  306. Zash

    Also wtf unicode has all sorts of 𝗯𝗼𝗹𝗱 𝘁𝗲𝘅𝘁

  307. jonasw

    and then fail to remove attributes such as onclick="alert('fnord')"

  308. zinid

    jonasw: yes, but you can do the same with other formats, no?

  309. jonasw

    not with non-XML formats

  310. jonasw

    you’d have to actively put attributes in the result

  311. Zash

    I think non-XML formats may be to easy to just run trough a regex and allow HTML trough

  312. jonasw

    Zash, depends

  313. jonasw

    think: [{"text": "some text", "emphasis": "strong"}, {"text": " and now without emphasis"}] -> '<strong>some text</strong> and now without emphasis'

  314. jonasw

    you can’t regex that in any reasonable way.

  315. jonasw

    anything in ["text"] will have to be htmlescaped, but otherwise it should be safe.

  316. zinid

    jonasw: other formats can also posses kinda "attributes" and you can also copy their contents into so evil DOM element blindly, no?

  317. jonasw

    (now I officially did it. I proposed a JSON-transport for things.)

  318. jonasw

    zinid, not if none of those attributes reasonably map to any DOM element which is evil

  319. jonasw

    we shouldn’t be needing any attributes at all, I think

  320. jonasw

    except maybe class if we do that palette thing

  321. jonasw

    (okay, href too)

  322. Zash

    [{"c":[{"c":"bold","t":"Str"}],"t":"Strong"},{"t":"Space"},{"c":"text","t":"Str"}]

  323. jonasw

    but attributes should be rather rare

  324. Zash

    ^ actual JSON "markup" format.

  325. jonasw

    Zash, not sure if that’s some serious markup you found somewhere or if you’re trolli... oh dear

  326. jonasw

    I don’t know what that does

  327. Zash

    jonasw: pandoc -t json

  328. jonasw

    but does it work?

  329. jonasw

    ah, t is type.

  330. Zash

    jonasw: It's a JSON dump of the internal parse tree

  331. jonasw

    right

  332. jonasw

    probably not a good choice since probably underspecified

  333. jonasw

    but yeah, that’s the idea

  334. Zash

    <Strong><Str>bold</Str></Strong><Space/><Str>text</Str>

  335. Zash

    ~$ pandoc -t native <<< '**bold** text' [Para [Strong [Str "bold"],Space,Str "text"]]

  336. zinid

    jonasw: if we can map attributes reasonably, can't we do the same for xhtml-im? and then forbid unknown attributes/elements

  337. jonasw

    zinid, the difference is that one requires action to fail, the other requires inaction.

  338. zinid

    I don't understand

  339. jonasw

    of course, XHTML-IM already defines that some attributes are evil and you need to remove them.

  340. jonasw

    that doesn’t magically make developers do that.

  341. SamWhited

    Even if they do it, any trivial mistake in the white list logic results in a vulnerability.

  342. zinid

    XML schema?

  343. zinid hides

  344. jonasw

    zinid, sure, nobody does that.

  345. jonasw

    it requires action to do that

  346. jonasw

    getting developers to take action is what’s tricky

  347. zinid

    yeah, I know

  348. jonasw

    (if it was only about trivial mistakes, we could provide an audited reference implementation)

  349. jonasw

    (either based on XML schemas or whatever works in JavaScript)

  350. zinid

    ok, we refuse xhtml-im, then what?

  351. zinid

    there are 100500 formats and we can invent others

  352. zinid

    how to choose?

  353. zinid

    ah, and we need to make sure there are no potential vulnerabilities, hell yeah

  354. zinid

    regarding markdown: if we don't choose it, then developers will blame us even harder that we don't use "modern" technologies like json or markdown, or rest :)

  355. jonasw

    zinid, I’m all in for a JSON-based markup

  356. zinid

    for example, I heard "xmpp is dead if they don't switch to json" and seems like people tend to agree with that

  357. Zash

    I note that PEP examples doesn't include the 'http://jabber.org/protocol/pubsub' feature. Is that supposed to be implied by <identity category='pubsub' type='pep'/>?

  358. Zash

    And the last version of PEP changed that section from being to=host to to=account and /some/ unnamed implementations still advertise stuff on the host

  359. moparisthebest

    Why hasn't anyone taken my bbcode suggestion seriously?

  360. moparisthebest

    On an actual serious note, there are markdown standards, we could just pick one

  361. moparisthebest

    http://commonmark.org/ is the best I know of