XSF Discussion - 2022-01-02


  1. kurisumakise

    How do and how should xmpp clients and servers go about chewing down the weird xml format? Because of the unclosed <stream:stream> you can't use a regular DOM parser. I skimmed through gajim and dino source code and they seem to reinvent their own limited DOM (obviously no xpath etc). Do any clients/servers work with SAX events in a manner different than building a node tree for a every stanza?

  2. flow

    kurisumakise, Smack does split the top level stream elements, synthesizes a <stream/> around that, and parses the result

  3. flow

    but I believe that there are also implementations that simply restart the parser on every <stream/> open tag

  4. flow

    as server you want to ensure that there is some accounting regarding the stanza size, but other than that, it should be fine

  5. kurisumakise

    Quite complicated. Leaves me wonderint what xmpp creators had in mind when going for xml. I mean, what method of parsing.

  6. Zash

    It hasn't really been as much of a problem as some people seem to think.

  7. lovetox

    kurisumakise, how complicated it is depends on the xml lib you use

  8. lovetox

    for example i rewritten nbxmpp xml parsing a few weeks ago

  9. lovetox

    the whole parser that dispatches elements is about 70 lines

  10. moparisthebest

    XMPP uses such a small subset of XML I tend to think it's inappropriate to use standard off the shelf XML parsers

  11. Sam

    In general, it's something that has lead to bugs and issues in every piece of XMPP related software I've ever written. It took me a *long* time to get it right (for some value of "right", but I think I got most of the bugs worked out) in Mellium

  12. Sam

    The small subset of XML is actually one of the big problems I run into. Using off the shelf XML parsers tends to be bad like moparisthebest said because then I run into bugs trying to limit them to what XMPP expects, but obviously writing your own namespace aware XML parser is no trivial task.

  13. Sam

    So as nice as the subset is, I think doing something that's not widely supported actually made things way harder.

  14. lovetox

    i actually did exactly that, we had a old parser which tried to be namespace aware but obviously was very limited and buggy, i just use now lxml, and its fine

  15. moparisthebest

    If you can stop it from using references and stuff fine

  16. Zash

    The problem seems to be that everyone thinks they must write their own XML parser.

  17. lovetox

    also the problem for me is, why would i not use 2 or 3 decades? worth of experience from developers with xml, and start new

  18. moparisthebest

    Because you can't stop it from doing non-xmpp things

  19. moparisthebest

    Otherwise you are right

  20. Holger

    This is about stopping XML libraries from _emitting_ anything that's outside our subset, no? Rather than the parsing being a problem.

  21. Holger

    Well the initial question was about regular DOM parsers. My point is mostly just that decoding/encoding are quite separate problems 🙂

  22. moparisthebest

    You don't want them parsing references etc either right?

  23. lovetox

    yeah true, i probably spent a day to find out how to effecively deactivate everything the xmpp does not support

  24. Holger

    Ah.

  25. Sam

    No, both sides are a problem. I remember an old (non XMPP, but similar issues) project where we ran into the <!proc exec cat /etc/passwd> or whatever it was issue. Turns out librarise out of the box were happy to do that, had no way to turn off all proc instances, and you had to know to turn it off.

  26. Sam

    I suspect these days most XML related things have figured out that that specific vulnerability is a thing, but the example still stands. There will be others.

  27. moparisthebest

    See: log4j2 lol

  28. guus.der.kinderen

    The XML parsers that I see have sane defaults or are easy to configure. I don't think that there's a clear benefit in one over the other, so far as 'use existing' vs. 'write your own' goes.