jdev - 2020-11-11

  1. Martin

    What characters have to be escaped within a message body? Do I find a list somewhere? Searching for it I only find stuff about JID escaping. :)

  2. Zash

    Only XML rules apply.

  3. Martin

    So only quot, amp, apos, gt and lt?

  4. Zash

    Must be valid UTF-8, must not have ASCII NUL. IIRC also ASCII control characters (\0 .. \31 or somesuch)

  5. Zash

    And yes, when serialized those would be entity-escaped, but I sure hope you don't need to handle that yourself.

  6. Martin

    Yep, that's the issue.

  7. Martin


  8. Martin

    active xmlns="http://jabber.org/protocol/chatstates"/><request xmlns="urn:xmpp:receipts"/></message>

  9. Martin

    Sorry, I'll have to use a pastebin. ^^

  10. Martin


  11. Ge0rG

    that message id makes my eyes bleed.

  12. Ge0rG

    Martin: but yes, clearly a client (library) bug

  13. Zash

    Is that the Profanity hmac-signed uuid in base64?

  14. Martin


  15. Link Mauve

    Martin, your XML library should prevent you from ever being able to serialise that kind of message.

  16. Martin

    Profanity uses libstrophe. Let's see what jubalh and pasis say.

  17. jonas’

    Martin, note that there is no way to escape \022

  18. jonas’

    it is simply not legal in XML character data

  19. jonas’

    it is simply not legal in XML 1.0 character data

  20. Zash


  21. jonas’

    so if you tried to escape it with &#x12; or somesuch, that would still be not-well-formed

  22. Martin

    It's also interesting how it ends up there: https://bugs.debian.org/974205

  23. jonas’


  24. Martin

    > Switch to console, run > profanity, and try some escape sequence such as hitting CTRL+V twice, > then enter. Disconnects from the server again. This one triggered it for me too.

  25. debacle

    Martin, IMHO such sequences should be filtered by the UI already, before it ever reaches the XML or XMPP library. I.e. ncurses.

  26. jubalh

    how will one define 'such sequences'?

  27. jubalh

    list all of them? only allow certain characters? what about unicode then?

  28. Zash


  29. jonas’

    oh my, where to start with this

  30. jonas’


  31. Link Mauve


  32. debacle

    jubalh Not sure. Check whether input is valid UTF-8? I hope, either glib or ncurses or expat have a function to check that? In case invalid input, blame user and throw away their input.

  33. Martin

    >Is the German letter ß a real letter or just a fancy way of writing ss? Eszet not SS! OMG…

  34. Link Mauve

    Martin, uppercasing might not agree with you. :p

  35. Martin

    Sorry, I don't get it.

  36. Link Mauve

    uppercase("weiß") might give "WEISS".

  37. Link Mauve

    I think it depends on the Unicode version.

  38. Martin

    We have an uppercased eszet now!

  39. Martin


  40. Link Mauve

    Turns out, Unicode is from before 2017.

  41. Link Mauve

    So it had to support the only existing rule back then.

  42. Martin goes on the street and demands inclusion of ẞ

  43. jonas’

    jubalh, so, easy. On input, you convert everything to unicode (please see the link). You’ll then have to filter out all codepoints between U+0000 and U+001F (incl.) except U+0009, U+000A and U+000D

  44. jonas’

    then you pass that to the XML library for serialisation as XML

  45. jonas’

    (the XML library should hit you if you don’t do the filtering; if it doesn’t, fix it)

  46. jubalh

    jonas’: will note it down, thanks

  47. flow

    the problem is already that the "XMPP (or XML) library" allows such codepoints in CDATA, is there even an XMPP (or XML) library invovled?

  48. Zash

    If you think there isn't, then *YOU* are the XML library!

  49. flow

    well depends, is printf(SOCKET, "<foo bar='baz'>asdf</foo>") an XML library?

  50. flow


  51. Ge0rG

    flow: you forgot some format strings that get passed attacker-supplied data