-
kurisumakise
How do and how should xmpp clients and servers go about chewing down the weird xml format? Because of the unclosed <stream:stream> you can't use a regular DOM parser. I skimmed through gajim and dino source code and they seem to reinvent their own limited DOM (obviously no xpath etc). Do any clients/servers work with SAX events in a manner different than building a node tree for a every stanza?
-
flow
kurisumakise, Smack does split the top level stream elements, synthesizes a <stream/> around that, and parses the result
-
flow
but I believe that there are also implementations that simply restart the parser on every <stream/> open tag
-
flow
as server you want to ensure that there is some accounting regarding the stanza size, but other than that, it should be fine
-
kurisumakise
Quite complicated. Leaves me wonderint what xmpp creators had in mind when going for xml. I mean, what method of parsing.
-
Zash
It hasn't really been as much of a problem as some people seem to think.
-
lovetox
kurisumakise, how complicated it is depends on the xml lib you use
-
lovetox
for example i rewritten nbxmpp xml parsing a few weeks ago
-
lovetox
the whole parser that dispatches elements is about 70 lines
-
moparisthebest
XMPP uses such a small subset of XML I tend to think it's inappropriate to use standard off the shelf XML parsers
-
Sam
In general, it's something that has lead to bugs and issues in every piece of XMPP related software I've ever written. It took me a *long* time to get it right (for some value of "right", but I think I got most of the bugs worked out) in Mellium
-
Sam
The small subset of XML is actually one of the big problems I run into. Using off the shelf XML parsers tends to be bad like moparisthebest said because then I run into bugs trying to limit them to what XMPP expects, but obviously writing your own namespace aware XML parser is no trivial task.
-
Sam
So as nice as the subset is, I think doing something that's not widely supported actually made things way harder.
-
lovetox
i actually did exactly that, we had a old parser which tried to be namespace aware but obviously was very limited and buggy, i just use now lxml, and its fine
-
moparisthebest
If you can stop it from using references and stuff fine
-
Zash
The problem seems to be that everyone thinks they must write their own XML parser.
-
lovetox
also the problem for me is, why would i not use 2 or 3 decades? worth of experience from developers with xml, and start new
-
moparisthebest
Because you can't stop it from doing non-xmpp things
-
moparisthebest
Otherwise you are right
-
Holger
This is about stopping XML libraries from _emitting_ anything that's outside our subset, no? Rather than the parsing being a problem.
-
Holger
Well the initial question was about regular DOM parsers. My point is mostly just that decoding/encoding are quite separate problems 🙂
-
moparisthebest
You don't want them parsing references etc either right?
-
lovetox
yeah true, i probably spent a day to find out how to effecively deactivate everything the xmpp does not support
-
Holger
Ah.
-
Sam
No, both sides are a problem. I remember an old (non XMPP, but similar issues) project where we ran into the <!proc exec cat /etc/passwd> or whatever it was issue. Turns out librarise out of the box were happy to do that, had no way to turn off all proc instances, and you had to know to turn it off.
-
Sam
I suspect these days most XML related things have figured out that that specific vulnerability is a thing, but the example still stands. There will be others.
-
moparisthebest
See: log4j2 lol
-
guus.der.kinderen
The XML parsers that I see have sane defaults or are easy to configure. I don't think that there's a clear benefit in one over the other, so far as 'use existing' vs. 'write your own' goes.