-
Martin
What characters have to be escaped within a message body? Do I find a list somewhere? Searching for it I only find stuff about JID escaping. :)
-
Zash
Only XML rules apply.
-
Martin
So only quot, amp, apos, gt and lt?
-
Zash
Must be valid UTF-8, must not have ASCII NUL. IIRC also ASCII control characters (\0 .. \31 or somesuch)
-
Zash
And yes, when serialized those would be entity-escaped, but I sure hope you don't need to handle that yourself.
-
Martin
Yep, that's the issue.
-
Martin
\022
-
Martin
active xmlns="http://jabber.org/protocol/chatstates"/><request xmlns="urn:xmpp:receipts"/></message>
-
Martin
Sorry, I'll have to use a pastebin. ^^
-
Martin
https://paste.debian.net/1171434/
-
Ge0rG
that message id makes my eyes bleed.
-
Ge0rG
Martin: but yes, clearly a client (library) bug
-
Zash
Is that the Profanity hmac-signed uuid in base64?
-
Martin
Yep
-
Link Mauve
Martin, your XML library should prevent you from ever being able to serialise that kind of message.
-
Martin
Profanity uses libstrophe. Let's see what jubalh and pasis say.
-
jonas’
Martin, note that there is no way to escape \022
-
jonas’
it is simply not legal in XML character data✎ -
jonas’
it is simply not legal in XML 1.0 character data ✏
-
Zash
UNACCEPTABLE
-
jonas’
so if you tried to escape it with  or somesuch, that would still be not-well-formed
-
Martin
It's also interesting how it ends up there: https://bugs.debian.org/974205
-
jonas’
hah
-
Martin
> Switch to console, run > profanity, and try some escape sequence such as hitting CTRL+V twice, > then enter. Disconnects from the server again. This one triggered it for me too.
-
debacle
Martin, IMHO such sequences should be filtered by the UI already, before it ever reaches the XML or XMPP library. I.e. ncurses.
-
jubalh
how will one define 'such sequences'?
-
jubalh
list all of them? only allow certain characters? what about unicode then?
-
Zash
https://www.w3.org/TR/2008/REC-xml-20081126/#charsets
-
jonas’
oh my, where to start with this
-
jonas’
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
-
Link Mauve
:)
-
debacle
jubalh Not sure. Check whether input is valid UTF-8? I hope, either glib or ncurses or expat have a function to check that? In case invalid input, blame user and throw away their input.
-
Martin
>Is the German letter ß a real letter or just a fancy way of writing ss? Eszet not SS! OMG…
-
Link Mauve
Martin, uppercasing might not agree with you. :p
-
Martin
Sorry, I don't get it.
-
Link Mauve
uppercase("weiß") might give "WEISS".
-
Link Mauve
I think it depends on the Unicode version.
-
Martin
We have an uppercased eszet now!
-
Martin
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E
-
Link Mauve
Turns out, Unicode is from before 2017.
-
Link Mauve
So it had to support the only existing rule back then.
- Martin goes on the street and demands inclusion of ẞ
-
jonas’
jubalh, so, easy. On input, you convert everything to unicode (please see the link). You’ll then have to filter out all codepoints between U+0000 and U+001F (incl.) except U+0009, U+000A and U+000D
-
jonas’
then you pass that to the XML library for serialisation as XML
-
jonas’
(the XML library should hit you if you don’t do the filtering; if it doesn’t, fix it)
-
jubalh
jonas’: will note it down, thanks
-
flow
the problem is already that the "XMPP (or XML) library" allows such codepoints in CDATA, is there even an XMPP (or XML) library invovled?
-
Zash
If you think there isn't, then *YOU* are the XML library!
-
flow
well depends, is printf(SOCKET, "<foo bar='baz'>asdf</foo>") an XML library?
-
flow
*fprintf
-
Ge0rG
flow: you forgot some format strings that get passed attacker-supplied data