XSF Discussion - 2017-10-14

Syndace 08:15:42
Can a client respond with an internal-server-error if something internal goes wrong when handling a request? The "server" part makes me unsure.
MattJ 08:16:12
Heh
MattJ 08:17:06
Do you have a specific case in mind?
MattJ 08:17:17
or is this just a "is it possible"?
Syndace 08:19:37
Im currently coding a client for fun and I have a situation where something internal may fail and I'm wondering what to respond in that case. To make it more specific: Before I send a stanza as response I validate it using the XML schema files and some additional logic. What do I do if the validation fails? I have to anwser something to iq request..
Syndace 08:22:23
Now that I think about it, maybe I should only validate incoming stanzas and not outgoing ones. The receiving entity can't expect valid stanzas anyway and has to validate itself.
Syndace 08:23:02
I just thought it would be cool to make sure what I'm sending is valid.
MattJ 08:23:03
Isn't that bad-request?
MattJ 08:23:25
Oh, sorry, I see what you're saying
MattJ 08:25:02
Maybe undefined-condition with a custom error would be appropriate here
MattJ 08:25:18
There's not much the remote party can do about it in any case
Syndace 08:29:13
Hmm the undefined-condition should be used in application-specific cases
Syndace 08:29:33
I mean, should is not must but it does not feel clean either
Syndace 08:32:56
I think I'll use my internal validation to display warnings/errors and send the invalid stanza anyway. At least this way finding and debugging invalid stanzas is easy.
Flow 09:55:44
Syndace, whatever you do, it's often a good idea to include a human readable english text into the error response which provides more information about what whent wrong. But only if that information does not cause some sort of security leak
Syndace 10:12:29
Flow, most of the errors are self explanatory, aren't they? Often the XEPs define semantics for error conditions. I like to avoid decisions that might cause security leaks.
Flow 11:10:11
Syndace, In my experience it's quite the opposite. For example internal-server-error: It's often usefull to know the cause of the error
Syndace 11:10:42
Why does the client have to know which internal server error occured?
Syndace 11:12:04
Nevermind, there are probably cases where additional info makes a lot of sense. I'll overthink which of my generated errors could benifit from additional info.
Flow 11:12:10
Syndace, so that it can report the error condition back to the server operator
Syndace 11:12:35
The server operator should really log internal errors himself..
Flow 11:13:30
and he will very likely do that
zinid 11:32:43
Syndace: sometimes a user sees an error and don't bother support, because the error is temporary, for example, database failure
pep. 13:10:02
jonasw, in your last email to the XHTML-IM thread, I fail to see how having a protocol break from xml to xml would fix OP's issue. I'd like to keep XML as well but if you do that you're still open to the same vulnerabilities
jonasw 13:12:34
pep., yes, I’m not convinced that XML helps, which is what I wrote I think?
pep. 13:12:44
let me reread
jonasw 13:13:22
actually I’m even posting an example of how this might be exploited
jonasw 13:14:05
yeah, I probably should’ve added something like "I *think* that […], but I’m not convinced that clients will do the right thing by default, which is why we’re trying to get rid ofXHTML-IM in the first place." will clarify on-list
pep. 13:14:14
:)
pep. 13:14:20
But,
pep. 13:14:25
hmm, yes
pep. 13:15:04
yeah, saying "We are now using this NS instead" won'T fix the problem of people not validating
pep. 13:15:04
So if people want a change, XML is a no-no
pep. 13:15:30
And so is markdown because some implementation (most?) accept html
jonasw 13:15:53
markdown is out even if only because it’s not extensible
pep. 13:16:00
Right
pep. 13:16:22
But I would go further and say, for XML, it's a no-no for web clients at all.
pep. 13:16:43
Because people are putting stuff everywhere without validating and that creates vulnerabilities :)
pep. 13:17:24
Not just in <body>
pep. 13:17:29
Even if it's the most obvious
goffi 14:14:40
pep.: that is issue with client dev, validating untrusted input is the first thing to learn when you do web dev (even not web actually)
goffi 14:15:32
pep.: (disclaimer: I'm not claming that my web software are failproof, security issue can and will happen to most software)
pep. 14:18:16
goffi, yeah, look at my last email on the list
Zash 14:18:43
Can't have nice things!
pep. 14:23:30
I should have put something like "unlike OP suggests" at the end of that sentence
goffi 17:27:11
jonasw: thanks for your last message, I kind of feel alone when I don't even understand while people are still talking about markdown after the debate we had.
goffi 17:27:21
s/while/why/
jonasw 17:28:36
I’m assuming some good faith (i.e. too long other thread and people didn’t read)
goffi 17:29:31
it's possible, I've had the same thing during OMEMO flamewar, hard to follow when you arrive after the battle.
Zash 17:29:51
I kinda wanna pin the entire xhtml-im problem on whoever invented .innerHTML
jonasw 17:30:03
Zash, .innerHTML isn’t the issue with XHTML-IM
jonasw 17:30:16
XHTML-IM breaks not only if you use .innerHTML, it also breaks if you use appendChild
Zash 17:30:50
Blame the web!
edhelas 17:33:26
Mardown in XMPP, seriously ?!
edhelas 17:35:36
i'm not against Markdown, but looks like we are trying to solve a problem by changing it
Zash 17:37:21
No, Markdown being defined as a superset of HTML rules it out
jonasw 17:37:35
dwd gave a proper definition of what he thinks should be markdown for IM.
jonasw 17:37:40
more or less proper.
jonasw 17:37:49
I’m tired of asking how to emphasize "Trainer*Innen" with that though.
Zash 17:38:38
\bold{Trainer*Innen}
jonasw 17:38:44
uhhh
jonasw 17:38:48
TeX in IM
goffi 17:38:49
one of the problem is that people always thing about their own use case
jonasw 17:38:51
that’s execellent
jonasw 17:38:53
it’s even turing complete!
Zash 17:38:57
(I don't actually know TeX, wild guess)
jonasw 17:39:16
it’d be \emph{} or \textbf{}, depending on whether you want emphasis (italics) or actual boldface
Zash 17:39:32
Is this Creole? http://www.wikicreole.org/wiki/Creole1.0
goffi 17:39:38
XMPP message can also be "normal" with a subject and potentially long, not necessarily a line in a MUC/MIX, and embedded images can be useful there (think about email gateway)
jonasw 17:39:46
goffi, agreed
jonasw 17:40:10
Zash, yes
goffi 17:40:17
I'll take a break on standard@ for tonight, I have written enough there today :)
Zash 17:40:58
Ok, me, a naive web developer does a simple string replace and puts the result as .innerHTML
waqas 17:41:32
You are doing a simple string replace? That's naive web developer lvl3
edhelas 17:42:21
I append my HTML like I append my variables in my SQL requests, using concatenation
goffi 17:42:29
why we don't use mirc colors in <body>? Looks like a good idea (as good as using markdown)
jonasw 17:42:54
goffi, those use control codes in the 0x00..0x1f range right? you can’t send those over XML
goffi 17:43:16
jonasw: easy, we just have to add \uxxxx
goffi 17:43:37
we can even use ANSI escape code like this, will be great
dwd 17:47:43
goffi, Note that I'm explicitly suggesting we *keep* XHTML embedding, but just avoid it being the go-to way of adding bold and italics in IM messages.
jonasw 17:47:59
dwd, I’m not convinced that this is a good thing.
jonasw 17:48:10
this just introduces fragmentation we can avoid.
Zash 17:48:21
Is it bold and italics and other *styling* people want?
dwd 17:48:32
jonasw, I don't really want to specify a whole new document definition.
Zash 17:48:37
If so, bringing back could work
jonasw 17:49:19
dwd, why not?
dwd 17:50:11
Zash, For IM, you mean? I think people want emphasis (mostly just *bold*) and preformat is handy, mostly because code-like stuff is about the only time you use * and / in IMs.
Zash 17:50:12
XFONT-IM, you get with a bunch of attributes that map roughtyl to CSS properties
jonasw 17:50:30
dwd, as I said. Trainer*Innen is a legit german word
goffi 17:51:15
dwd: for <message> ? I didn't got that, I though you were suggesting the separate XHTML XEP (which is a good idea) only for blogging
Zash 17:51:23
~$ pandoc <<< '*Trainer*Innen*' TrainerInnen*
Zash 17:51:32
Palm -> face
goffi 17:51:33
but sorry, movie time, will read updates later
Zash 17:52:56
jonasw: You type ^B Trainer*Innen ^B in your client. The client translates the input into protocol and sends it.
Zash 17:53:08
Or click the [B] button
jonasw 17:53:14
Zash, sure, but if the protocol doesn’t support it, because it’s mardown?
Zash 17:53:52
~$ pandoc -f html -t markdown <<< 'Trainer*Innen' **Trainer\*Innen**
jonasw 17:53:56
nice
jonasw 17:54:17
at this point we can simply use a proper format, because nobody will learn that syntax for themselves.
dwd 17:54:19
jonasw: Either (a) you can't. (b) We define bold toggle as being on a word break. (c) We use doubled asterisks as an escape.
jonasw 17:54:35
dwd, see above
jonasw 17:54:43
"you can’t" is a terrible answer
Zash 17:54:56
dwd: Should we really demand end-users learn some syntax?
jonasw 17:54:58
"bold toggle on word breaks" moves the "you can’t" answer to other cases
jonasw 17:55:11
"doubled asterisks as an escape", see what Zash says
dwd 17:55:40
No, it's not. You also cannot embed images. This is OK. There are lots of edge cases. Imagine how many there's going to be on a complete document markup language.
Zash 17:55:40
~$ pandoc <<< '**Trainer*Innen**' **Trainer*Innen**
Zash scratches head 17:55:55
jonasw 17:56:27
dwd, of course. which is why I suggest to have a language we can easily extend instead of some markdown-ish markup.
jonasw 17:56:34
that easily allows us to take care of edge-cases and new use-cases later
jonasw 17:56:38
without breaking everything again
uc 17:59:25
Don't forget S̶t̶r̶i̶k̶e̶ t̶h̶r̶o̶u̶g̶h̶
zinid 17:59:29
lol, guys, you will never come to agreement :)
zinid is reading another round of xhtml-im ranting 17:59:47
Zash 18:02:35
I see. Then there can only ever be conflict between us.
remko 18:20:28
i shall refrain from suggesting to use docbook markup
jonasw 18:20:54
remko, how about groff?
Zash 18:21:00
I actually went to look through a bunch of existing XML formats yesterday
Zash 18:21:07
There's a bunch
remko 18:21:48
for the subset that i think we want (bold, italic, code), i don't think it matters really.
jonasw 18:22:05
remko, depends on who "we" includes, "bold, italic, code" doesn’t cut it.
remko 18:22:18
'we' => '99% of the use cases' ;-)
Zash 18:22:31
But I'm not sure any XML format will make it difficult enough to do the wrong thing
jonasw 18:23:07
Zash, I can see people not sanitising attributes and simply changing local names, unfortunately.
Zash 18:23:21
Yup
remko 18:24:23
jonasw: looking at all the IM clients out there, bold, italic, and underline are the only thing they seem to support. I haven't heard people complaining about this limitation.
Zash 18:25:01
Don't forget iChat users with colors
remko 18:25:07
it'll boil down to whether we want a structured format or a non-structured format. Personally, I'm torn. I used to lean one way, but am now leaning the other.
jonasw 18:25:17
remko, you haven’t heard me then ;-)
jonasw 18:25:59
remko, at least there’s also blockquote
remko 18:26:34
jonasw: I like blockquote. But there is a case to be made that this should be replaced by a snippet payload.
Zash 18:26:39
remko: semantics vs style, xml vs not xml, structured vs not
Zash 18:26:47
Any more considerations?
remko 18:26:48
Zash: Messages doesn't even support markup i just noticed.
jonasw 18:26:49
remko, and how’d you encode those in snippets?
jonasw 18:27:13
do we want full-blown HTML snippets, with all the vulnerabilities that has?
remko 18:27:26
nope. Just <pre>.
jonasw 18:27:36
do you confuse blockquote with code?
remko 18:27:42
i.e. a snippet is just rendered as a <pre>, no markup.
jonasw 18:27:44
https://d2k1ftgv7pobq7.cloudfront.net/meta/u/res/images/db8a72d486e14d6fe249b6a80962b69b/slack-webdesign-cropped.jpg
remko 18:27:54
jonasw: i did confuse both
jonasw 18:28:08
here’s one example of inline images, links, something like headings in an IM client
Zash 18:28:35
jonasw: Isn't that more like a referece?
remko 18:28:37
jonasw: for blockquote, the question is whether this is really a part of the message or not. Could be a reference to something that you happen to render this way.
jonasw 18:29:02
Zash, sure, it should be a reference in addition, and ideally the markup should reference the reference to make things super-clear
jonasw 18:29:14
but having every client support every type of reference isn’t a good idea I think
Zash 18:29:25
jonasw: {xep attaching} maybe?
Bunneh 18:29:25
jonasw: XEP-0367: Message Attaching (Standards Track, Deferred, 2017-09-11) See: https://xmpp.org/extensions/xep-0367.html
jonasw 18:29:25
wouldn’t it make more sense to have references in addition rather than instead of content?
remko 18:29:27
i don't think the bulk of 'modern' (non-XMPP) IM clients out there put images and links in their messages. They use autolinking and unicode replacement object, and attach the immage as an external object.
edhelas 18:29:41
the issue with XHTML-IM are also things like images
jonasw 18:29:50
remko, sure, you need to mark up where the image goes though
Zash 18:29:57
Do modern messagers even do actual in-line images?
remko 18:30:06
jonasw: hence the unicode replacement object
jonasw 18:30:12
which is again some kind of markup
Zash 18:30:13
As opposed to messages that consist only of an image
remko 18:30:24
Zash: i wonder about that. I have seen them use the unicode replacement object, but am not sure if they actually use it for placement.
remko 18:30:54
(talking about Messages concretely)
remko 18:31:38
if you look in the Messages DB, i see messages that came with an image as { "text": "\uFFFC Hi there", "attachments": [ { "image": "..."}]} (or some sorts)
remko 18:32:07
but as i said, i'm not sure if they actually use this, or just always render the image at the front/back of the message.
jonasw 18:32:11
remko, that looks like something which grew historically because they don’t have proper markup.
remko 18:32:23
might be
remko 18:32:33
in any case, i would rather not have any markup for images than <img> tags.
jonasw 18:32:47
what’s wrong wtih <img/> tags or their equivalent in any other markup?
remko 18:33:49
it's a slippery slope to too complex document. It's also ambiguous how text should flow around this stuff etc. If you don't allow it, it's less ambiguous
remko 18:34:16
just render is at a separate image. That's also how people feel IM should work i think.
Zash 18:34:46
remko: Like a separate type of message box?
remko 18:34:53
yes
jonasw 18:34:57
I think there’s a lot of value for that type of rich messages.
Zash 18:35:02
Yeah I think most things I've seen do that
jonasw 18:35:04
possibly not images, but other semantic markup
remko 18:35:18
jonasw: i'm not saying there's no use case in XMPP for rich messages with full document markup. IM is just not one IMO.
jonasw 18:35:25
talking about IM.
jonasw 18:35:33
not necessarily human-generated messages though
remko 18:35:50
those should perhaps be a different thing then.
jonasw 18:35:58
why make it a different thing?
remko 18:36:08
slack also distinguishes between bot messages and human-written messages.
remko 18:36:21
you can't do anything beyond bold, italic, and underline as a person.
jonasw 18:36:29
but why does the transport need to be different?
jonasw 18:36:47
sharing the transport format for whatever markup we’re doing leads to better interoperability
Zash 18:37:47
https://xmpp.org/extensions/inbox/content-types.html
jonasw 18:38:05
oh god
jonasw 18:38:09
type='text/xml'
jonasw 18:38:11
I’m in pain now.
remko 18:38:14
haven't read the XEP, but saw that pass the mailing list. That sounded like pandora's box to me :)
jonasw 18:38:55
that specific implementation feels bad
jonasw 18:39:09
and I’m still not convinced that it’s a good thing to have in any case. This will fragment implementations.
Zash 18:39:23
I think someone (could be me) suggested <body>markdown markup here</body><body-content type='markdown'/>
zinid 18:40:08
Zash: that's exactly Example 1 from the ProtoXEP
zinid 18:40:17
<message from='person1@example.org/34892374' to='person2@example.org/938089023' type='chat'> <body>**Note:** This message is very important.</body> <content type='text/markdown' xmlns='urn:xmpp:content'/> </message>
Zash 18:40:30
zinid: ah, must have scrolled past that
jonasw 18:40:38
zinid, yes, but an empty <content/> has a different meaning than a <content> with text content, which is super awkward.
zinid 18:40:53
jonasw: yep, I don't think we need this crap
Zash 18:41:25
I actually think it's awkward with <body/><xhtml-im/>
zinid 18:42:16
still, it's unclear what should a client render if it doesn't support the content type?
zinid 18:42:30
in the case of markdown it's obvious, but with another formats/
zinid 18:42:31
?
Zash 18:42:38
zinid: if you do this only with formats that are still readable when treated as plain text, it's probably fine
zinid 18:42:50
Zash: yes
Zash 18:43:10
You probably also wanna have explicit features for formats you understand
jonasw 18:43:22
feature discovery doesn’t work in modern IM though.
Zash 18:43:24
As in, disco#info features and the whole caps dance
Zash 18:43:43
Oh right, we're moving away from the end-to-end thing :/
jonasw 18:44:06
let the server handle translation to the different markup types understood by the client!
Zash 18:44:16
Ohgod
zinid 18:44:24
jonasw: OMEMO guys will not approve :D
jonasw 18:44:53
zinid, right
zinid 18:46:51
feature discovery won't help in MUCs
jonasw 18:47:05
that, too
jonasw 18:47:09
or MIXes
Zash 18:47:23
or when you switch client and fetch stuff from MAM
Zash 18:47:27
or if you use carbons
Zash 18:47:28
or ...
jonasw 18:47:43
yes, so, that’s not really gonna work.
Zash 18:47:54
Can we even do things at all then?
jonasw 18:48:03
sure
jonasw 18:48:13
like we’ve been doing it with <xhtml-im/>
Zash 18:48:27
multipart/alternative basically
jonasw 18:48:31
yupp
Zash 18:49:43
Do everything at once and let the recipient do what they want :)
zinid 18:52:34
do I understand correctly: the only concern about xhtml-im is security issues?
jonasw 18:52:49
zinid, the "only", yes
jonasw 18:52:52
for me at least.
remko 18:53:29
no, i don't think that's true
Zash 18:53:54
That's why SamWhited wants it dead, no?
remko 18:54:11
that's what initially triggered the discussion, yes. And it's important, because xhtml-im is very hard to sanitize.
Zash 18:54:13
It's too easy to just stick the incoming XML tree into a browser DOM
remko 18:54:30
but i think other people don't want XHTML-IM, because it's very unpredictable to render in a chat log.
zinid 18:54:45
can we provide testing vectors?
Zash 18:54:46
User studies needed
SamWhited 18:54:59
I hadn't actually considered the difference between text style and layout before, it was just security for me at first, but now I agree that we need something that's purely style, not layout.
jonasw 18:55:28
SamWhited, I think we need something for semantics, not for style.
jonasw 18:55:36
(neither for layout by the way)
Zash 18:55:53
I actually think normal people will want style rather than semantics
remko 18:55:56
semantics for things that don't need layout :)
jonasw 18:56:02
emphasis, blockquote, strong emphasis, lists and enumerations, and code at the very least.
remko 18:56:03
yes, zash is right
SamWhited 18:56:04
jonasw: yes, that's fair, I think I agree with that. I can imagine certain clients render "emphasis" as italics and other bold or something similar.
jonasw 18:56:18
SamWhited, that, and also accessibility tools
jonasw 18:56:22
like screen readers
jonasw 18:56:29
they benefit a lot from semantics.
remko 18:56:30
i actually agree with Zash, people don't care about semantics in IM, they care about style.
SamWhited 18:56:37
Yes, I think for the most part you'll find they're the same for anything simple though.
jonasw 18:56:44
Zash, they think they want style, but they actually want semantics.
remko 18:56:49
they want something to be bold, not 'emphasized'
Zash 18:56:57
jonasw: That's probably true.
jonasw 18:57:00
remko, they want something to be bold to emphasize it
jonasw 18:57:11
they don’t think in terms of semantics, but that’s what they want
remko 18:57:17
different people have different interpretations of emphasis.
remko 18:57:31
if i want something emphasized, i don't want it in italic (even though that's the standard way to emphasize things)
jonasw 18:57:46
remko, you’re free to chose strong emphasis then, people make that kind of mistakes all the time.
jonasw 18:57:58
it’s still emphasis, and that’s the meaning which is wanted to be conveyed and which is conveyed
SamWhited 18:58:21
Although, if I'm a client author I'm going to put a "Bold" button, not an "Emphasis" button and it would be confusing if on one of my other clients the "Bold" button turns out to be italics.
jonasw 18:58:25
SamWhited, when you’re saying "purely style", I’m afraid that use-cases like enumerations are excluded though.
remko 18:58:44
SamWhited: exactly
jonasw 18:59:08
SamWhited, yes of course you wouldn’t label it emphasis ;-). and it should be made clear which defaults apply for the two kinds of emphasis people have (render weak -> italic, strong -> bold; people will choose strong in 99.9% of the cases, that doesn’t matter)
SamWhited 18:59:10
yah, nevermind, I juts changed my mind again. Conveying semantics might be nice in some cases, but all I want is the most dead simple thing we can do.
jonasw 18:59:44
will all due respect, I wonder whether you might maybe want to consider not only what you want ;-)
remko 19:00:03
jonasw: so you're saying that you're offering a bold and italic button, but insist they're using semantics? It has to render the same way on the other side, so i don't think it's semantics anymore.
SamWhited 19:00:11
I just gave you the reason why client developers want it.
SamWhited 19:00:17
(probably)
jonasw 19:01:00
SamWhited, as a client developer, I want to have one markup which covers all things my users are likely to encounter. This includes more complex messages (possibly generated by automated systems, similar to slack integrations or something). And I don’t want to have to support three different tiers of markdown dialects to achieve that.
jonasw 19:01:34
I may not offer my users the tools to create, e.g., a three-level nested enumerated list in the interface, because this ain’t a word processor.
Zash 19:01:48
People can't use a comma?
SamWhited 19:02:21
Wait, what? Who said anything about multiple tiers of markdown dialects?
jonasw 19:02:24
Zash, context?
Zash 19:02:38
jonasw: lists
jonasw 19:02:49
SamWhited, that’s what happens if we go down the route of "we’ll start with some simple text-based markup and oops, then we’ll find out two years later that we also need something to do lists or whatever, so let’s bump the namespace"
jonasw 19:02:59
Zash, bullet points, if you will
zinid 19:03:06
Zash: lists are easier to read I guess
jonasw 19:03:10
much easier
Zash 19:03:16
Do peopel use this when talking?
zinid 19:03:27
I do sometimes
jonasw 19:03:30
Zash, I do ;-). but also, "possibly generated by automated systems"
SamWhited 19:03:48
I think that: 1. That's probably not a problem 2. It already works fine
zinid 19:03:57
Zash: for example to tell my wife what to buy in a shop ;)
Zash 19:04:06
I have found a 𝐁𝐎𝐋𝐃 solution!
jonasw 19:04:08
~~it works fine until you have multiple lines in a bullet (e.g. due to word-wrap) and then it is unreadable, SamWhited,~~ ✎
jonasw 19:04:11
it works fine until you have multiple lines in a bullet (e.g. due to word-wrap) and then it is unreadable, SamWhited. ✏
jonasw 19:05:23
I would like to quote a few things from the Zen of Python which have been in my mind during this whole "the new markup for XMPP" discussion: Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. […] In the face of ambiguity, refuse the temptation to guess.
jonasw 19:06:12
and also, especially since people are now suggesting that they’re creating precedents by implementing things in a wide-spread client: Now is better than never. Although never is often better than *right* now.
Zash 19:06:15
dwd: btw, *bold* in markdown isn't bold, but italics :P
jonasw 19:06:19
(that, too)
jonasw 19:07:05
SamWhited, I really, really don’t understand what the issue is with creating an extensible, very simple markup or re-using one instead of restricting us to a very small set of things.
SamWhited 19:07:27
I never said there was a problem with it
jonasw 19:08:18
fair.
jonasw 19:08:26
I somehow felt you did, but that wasn’t true.
jonasw 19:08:41
maybe I mixed up an email somewhere and had that lingering thought somewhere, I apologize.
Zash 19:09:30
SamWhited: Do you think any XML based format will be too easy to do the wrong thing with?
SamWhited 19:10:44
Zash: I'm not sure, I suspect so, but I have no idea.
zinid 19:11:07
Zash: only when you put unescaped cdata into DOM?
SamWhited 19:11:25
XMPP doesn't allow CDATA, no?
Zash 19:11:28
Question is, where's the cutoff where people will prefer the right thing?
jonasw 19:11:36
SamWhited, cdata is any text, actually
zinid 19:11:42
SamWhited: well I mean the content of <body/> for example
SamWhited 19:11:46
*unescaped CDATA
Zash 19:12:04
SamWhited: You are thinking of <![CDATA[ ]]>?
Zash 19:12:12
Whatever that's called
SamWhited 19:12:23
yah, that
Zash 19:12:42
Do we disallow it tho?
zinid 19:12:43
no, I meant a text within tags in general
zinid 19:13:39
I see no other way how to screw up
jonasw 19:13:58
zinid, I have one: let’s say we have a tag <emph/> for emphasis
zinid 19:14:03
we can assume that you can screw up during transforming layout element into DOM, but you can screw up this way with any formats
jonasw 19:14:08
a client may simply do a translation mapping the emph local name to em for XHTML.
Zash 19:14:17
Also wtf unicode has all sorts of 𝗯𝗼𝗹𝗱 𝘁𝗲𝘅𝘁
jonasw 19:14:17
and then fail to remove attributes such as onclick="alert('fnord')"
zinid 19:14:27
jonasw: yes, but you can do the same with other formats, no?
jonasw 19:14:31
not with non-XML formats
jonasw 19:14:38
you’d have to actively put attributes in the result
Zash 19:15:28
I think non-XML formats may be to easy to just run trough a regex and allow HTML trough
jonasw 19:15:45
Zash, depends
jonasw 19:16:21
think: [{"text": "some text", "emphasis": "strong"}, {"text": " and now without emphasis"}] -> 'some text and now without emphasis'
jonasw 19:16:31
you can’t regex that in any reasonable way.
jonasw 19:16:47
anything in ["text"] will have to be htmlescaped, but otherwise it should be safe.
zinid 19:17:03
jonasw: other formats can also posses kinda "attributes" and you can also copy their contents into so evil DOM element blindly, no?
jonasw 19:17:07
(now I officially did it. I proposed a JSON-transport for things.)
jonasw 19:17:18
zinid, not if none of those attributes reasonably map to any DOM element which is evil
jonasw 19:17:32
we shouldn’t be needing any attributes at all, I think
jonasw 19:17:36
except maybe class if we do that palette thing
jonasw 19:17:53
(okay, href too)
Zash 19:17:59
[{"c":[{"c":"bold","t":"Str"}],"t":"Strong"},{"t":"Space"},{"c":"text","t":"Str"}]
jonasw 19:18:02
but attributes should be rather rare
Zash 19:18:20
^ actual JSON "markup" format.
jonasw 19:18:25
Zash, not sure if that’s some serious markup you found somewhere or if you’re trolli... oh dear
jonasw 19:18:29
I don’t know what that does
Zash 19:18:33
jonasw: pandoc -t json
jonasw 19:18:33
but does it work?
jonasw 19:18:45
ah, t is type.
Zash 19:18:47
jonasw: It's a JSON dump of the internal parse tree
jonasw 19:18:52
right
jonasw 19:19:10
probably not a good choice since probably underspecified
jonasw 19:19:14
but yeah, that’s the idea
Zash 19:20:27
<Str>bold</Str><Space/><Str>text</Str>
Zash 19:22:01
~$ pandoc -t native <<< '**bold** text' [Para [Strong [Str "bold"],Space,Str "text"]]
zinid 19:22:49
jonasw: if we can map attributes reasonably, can't we do the same for xhtml-im? and then forbid unknown attributes/elements
jonasw 19:23:10
zinid, the difference is that one requires action to fail, the other requires inaction.
zinid 19:23:22
I don't understand
jonasw 19:23:25
of course, XHTML-IM already defines that some attributes are evil and you need to remove them.
jonasw 19:23:31
that doesn’t magically make developers do that.
SamWhited 19:23:43
Even if they do it, any trivial mistake in the white list logic results in a vulnerability.
zinid 19:25:42
XML schema?
zinid hides 19:25:44
jonasw 19:26:05
zinid, sure, nobody does that.
jonasw 19:26:08
it requires action to do that
jonasw 19:26:20
getting developers to take action is what’s tricky
zinid 19:26:29
yeah, I know
jonasw 19:26:31
(if it was only about trivial mistakes, we could provide an audited reference implementation)
jonasw 19:26:48
(either based on XML schemas or whatever works in JavaScript)
zinid 19:29:59
ok, we refuse xhtml-im, then what?
zinid 19:30:11
there are 100500 formats and we can invent others
zinid 19:30:28
how to choose?
zinid 19:31:05
ah, and we need to make sure there are no potential vulnerabilities, hell yeah
zinid 19:35:50
regarding markdown: if we don't choose it, then developers will blame us even harder that we don't use "modern" technologies like json or markdown, or rest :)
jonasw 19:36:23
zinid, I’m all in for a JSON-based markup
zinid 19:36:37
for example, I heard "xmpp is dead if they don't switch to json" and seems like people tend to agree with that
Zash 22:23:13
I note that PEP examples doesn't include the 'http://jabber.org/protocol/pubsub' feature. Is that supposed to be implied by <identity category='pubsub' type='pep'/>?
Zash 22:24:17
And the last version of PEP changed that section from being to=host to to=account and /some/ unnamed implementations still advertise stuff on the host
moparisthebest 23:42:24
Why hasn't anyone taken my bbcode suggestion seriously?
moparisthebest 23:43:09
On an actual serious note, there are markdown standards, we could just pick one
moparisthebest 23:44:11
http://commonmark.org/ is the best I know of