-
Sam
Who else is a treasurer or treasurer adjacent that should have access to the Open Collective? I assume board people? All of them or just some? Anyone else?
-
Kev
Just Peter, probably.
-
Sam
I vaguely feel like there should be more than one person with access to reduce bus factor, especially when it comes to things that handle money, but whatever the board wants I suppose.
-
Sam
We should also consider who is allowed to use the XSF as a fiscal host and how we decide. My instinct is "anything XMPP related" and "at boards discretion" but it would be good to get that confirmed by board and have the treasurer or someone else bring any new applicants before board each week (I am happy to do this if peter doesn't since it's just forwarding names along, I just want to make sure the board is okay with all this since it involves money and I don't want to just make a bunch of stuff up and hope it's fine)
-
jonas’
something about a CoC
-
Sam
This is a little bit different, but also making people agree to follow the CoC once we have one if they want to use us as a fiscal host seems reasonable. I'll draft some text and send it to the board email for discussion. I think it will be relatively non-controversial and we can always change it at any time.
-
Zash
How do we determine which pieces of software goes on the software listings? Probably some overlap with that selection method.
-
Sam
In case anyone wants to brain storm: https://pad.disroot.org/p/XSF_Fiscal_Host_Rules
-
Sam
huh, TIL: "Jabber Open Source License" https://opensource.org/licenses/jabberpl
-
Sam
I'm assuming that was an early jabberd thing. Glad that got retired.
-
moparisthebest
dwd, flow, if you have a spare moment you could see if you are less horrified by https://github.com/moparisthebest/xmpp-proxy/blob/master/src/stanzafilter.rs#L224 (and thanks for the state machine hint flow !) again the point is to NOT have a full on XML parser, but simply to reliably split on stanza boundaries so complete stanzas can be passed to a real XML parser later
-
moparisthebest
on a related note, is anyone aware of some comprehensive XMPP XML stream tests anywhere?
-
Sam
moparisthebest: what sort of tests are you looking for? More stuff like the ones you linked for splitting XML, or something that matches XML streams to a big jabber;client schema or something?
-
moparisthebest
I only need to test splitting XML stanzas out of a stream, so strange formatting, CDATA, processing instructions, comments, really anything that might trip such a thing up
-
moparisthebest
in the end, probably need to investigate creating some type of XMPP XML stream fuzzer, but in the short term I was hoping to steal some test cases from existing projects
-
Zash
`<x><![CDATA[ lol</x> ]]></x>`
-
dwd
moparisthebest, I'm wondering if you maybe *do* need an XML parser, but a decent fast one. I used rapidxml (or at least a fork of it) in Metre, which worked really well, and stood up to AFL very well.
-
moparisthebest
Zash, handles that one fine thanks
-
moparisthebest
added it to the test
-
moparisthebest
I just want to split on stanza boundaries, I do not want to allocate memory to parse anything
-
dwd
moparisthebest, Sure, but rapidxml dopesn't allocate anything either.
-
dwd
moparisthebest, And you're getting achingly close to an XML parser there anyway.
-
moparisthebest
http://rapidxml.sourceforge.net/manual.html#namespacerapidxml_1memory_allocation ?
-
dwd
moparisthebest, And yet, <x a='/>'>This is going to be fun.</x>
-
dwd
moparisthebest, Yeah, there's a pool for attributes, but since it's a pool it's a single allocation. If you ported it you could *probably* ditch that for the kind of "chopping out elements" work you're trying to do.
-
Sam
I really need something like this in Go too. I try to keep Mellium relatively fast, but the XML parser is *terrible* and there's not much point to me optimizing things when we're using a parser as slow as the one we're using
-
dwd
moparisthebest, <a a='![CDATA['/> might be fun too.
-
Sam
moparisthebest: you might consider fuzzing this. XML is flexible enough that I don't think you'll come close trying to think up edge cases yourself.
-
Zash
Probably easier to find a generic fuzzer and let it figure out XML syntax anyway
-
dwd
moparisthebest, What Sam says, besides you'll just be writing the same asusmptions into your tests you've been coding for, like all of us do.
-
dwd
Zash, AFL can do this, indeed.
-
dwd
Zash, Dunno if it'll work with Rust, but ... maybe?
-
Zash
AFL-RIIR is probably a thing already
-
Sam
Is it worth tying fiscal sponsorship to membership (or saying that at least one person in your project must begin seeking membership)? I don't know if it matters, just seems like something organizations do. That way you've already accepted whatever CoC and other rules we come up with.
-
Sam
Maybe not, that seems super limiting.
-
Sam
</thinking-out-loud>
-
Zash
Seems sensible. (needing membership.)
-
Sam
But also, why?
-
Zash
Dunno. Why not? Dunno to that too.
-
dwd
I don't actually know if that's sensible or not.
-
Sam
It would be nice to have a representative from every project, but also if this is a service to the community then maybe we want to make it as easy and open as possible.
-
Sam
Not something we have to decide immediately, I'm just thinking about what a policy write up would look like.
-
moparisthebest
`<a a='![CDATA['/>` works fine, but indeed I hadn't planned for `<x a='/>'>This is going to be fun.</x>` which will require a "InAttribute" state, thanks dwd
-
jonas’
afl works with anything (but is less efficient and less effective) if you run it in qemu mode :)
-
dwd
moparisthebest, Right - it *feels* like you're basically writing an XML lexer, if not a parser. THough I'll be honest and say this is one of those cases where my lack of a CS degree means I don't really know the difference.
-
moparisthebest
Metre can't proxy c2s right?
-
dwd
moparisthebest, Nope.
-
dwd
moparisthebest, And it only "truly" proxies S2S with the server's consent, as it were.
-
jonas’
moparisthebest, right, what dwd says -- check out parser generators and let one of them build a lexer for you based on the official XML grammar
-
jonas’
that won’t allocate a lot if anything at all, depending on the implementation
-
Kev
I’d probably *not* be inclined to encourage membership for the sake of the sponsorship stuff. On the basis that the XSF doesn’t benefit from having lots of members, only from having members who are sufficiently motivated/able to do the few teams that need membership, and otherwise to be on top of things enough to make judgements on Council/Board positions based on people’s interactions with the community. Encouraging people to become members purely to get access to money doesn’t really help with that.
-
Zash
Kev, good point.
-
dwd
Indeed.
-
Sam
*nods* good point
-
moparisthebest
I guess fuzzing won't really do what I need, I need a stream of XMPP XML and to verify I split it at the correct boundaries
-
moparisthebest
no one knows of projects that have tests consisting of anything like that for their parsers ?
-
Kev
Fuzzing is what you need in terms of testing you don’t fall apart in the face of bad input, but not in terms of ensuring boundaries are correct, indeed.
-
moparisthebest
yea fuzzing is certainly valuable, just also need other things
-
Sam
Maybe more of a mix of fuzzing and integration testing then. Generate random XML input, pipe it through your splitter and a real parser, when you detect a difference generate a unit test from that.
-
moparisthebest
generating random-but-valid XMPP-subset-of-XML sounds hard
-
Sam
Not really. Elements, random cdata, random attributes.
-
moparisthebest
yea, but then we are back to testing only the things I know about
-
Kev
Yeah. It’s easy as long as you don’t want anything that you didn’t already think of and could have generated manually :D
-
moparisthebest
essentially yea :)
-
Kev
I, once upon a time, wrote an XML-aware (and fairly naive) fuzzing layer for Swiften that would modify stanzas on the way out randomly so we could run ‘good’ Sluift scripts against M-Link and have them modified in malicious ways.
-
Kev
That was in the days before AFL These days you’d run the same scripts to generate a corpus to feed into a branch-aware fuzzer instead, presumably.
-
Sam
No, because you have random attributes and the like
-
moparisthebest
"random" but also that follows the rules I know about like cannot contain " or ' , but those are the rules I know, and have implemented already
-
moparisthebest
basically for each chunk my splitter spits out, when fed into a real XML parser, it should either: 1. parse a complete stanza 2. error out because of invalid xml (mis-matched tags etc etc) the thing it should never do is: 3. wait for the rest of a partial stanza
-
flow
moparisthebest, no, jxmpp has a corpus of valid and invalid JIDs, but no corpus of valid and invalid XMPP streams. Wanna team up? :)
-
Kev
FWIW, I would be inclined to use an XML library for this, unless and until you can see that the performance through that is inadequate.
-
moparisthebest
even my super naive and known-wrong initial splitter worked perfectly fine with normal-case XMPP, I ran it for days filtering 100% of XML into my server without any errors, it's the other cases that need work
-
moparisthebest
I'm after zero-memory-allocations rather than performance
-
Kev
TBH, if you’re worried about ‘working’ rather than ‘correct’, a few days of data on an active and well-peered server usually catches most edge cases, in my experience.
-
Sam
I guess I don't get what you're trying to test for then. Running random inputs against a real XML parser and your thing seems like it would identify unknown areas where splitting is broken.
-
moparisthebest
yes I think it'd be valuable, just not as valuable as the horrors-people-have-seen-in-the-wild and added test cases for, but if those don't exist...
-
Kev
Run AFL against libxml2, generate a corpus, feed that in?
-
Sam
Yah, I don't know that you'll do well finding specific things from people to test. This is too general for that.
-
moparisthebest
now that's an interesting thought Kev ...
-
Sam
Isn't that what I said except recommending specific tools? I am not understanding something about what's being tested here I guess.
-
Kev
It may be what you said, but not what I read :)
-
Kev
(Which is probably on me)
-
moparisthebest
Sam, mainly, if I write a tool to generate all possible XML as I understand it, I might miss something valid that I don't know about
-
moparisthebest
vs, fuzzing, in theory, should eventually hit all cases ?
-
Sam
Fuzzing is literally what I said, but yah, I didn't mean "write your own thing". Anyways, what Kev said is what I was suggesting. Do that, it will be better than asking for samples which will never catch the one weird edge case.
-
Sam
My apologies if I wasn't clear.
-
moparisthebest
no my bad I appreciate it
-
moparisthebest
flow, sure, but maybe this is a good path forward already, convince a fuzzer to generate individually good stanzas, combine them in random orders for good streams ? :/
-
flow
moparisthebest, not saying that it's not, just that a curated corpus would be also nice
-
moparisthebest
flow, I agree, got any thoughts on gathering that together? :)
-
Sam
What is this corpus for specifically?
-
moparisthebest
testing XMPP XML stream parsing ?
-
Sam
Just where to split XML tokens?
-
Sam
I'm just wondering how many people actually do their own XML parsing.
-
flow
I was thinking of a corpus of valid and invalid XMPP streams
-
moparisthebest
my thing is only concerned on where to split stanzas out of an XML stream, but such a corpus would be more generally useful
-
Sam
I just don't understand what that tests unless you wrote your own parser
-
flow
entries in the valid corpus would contain the stream and the indivudual elements that the splitter should identify
-
moparisthebest
and how many wrongly-use generic XML parsers and allow comments, processing instructions, etc etc Sam ?
-
flow
and entries in the invalid corpus should be just rejected
-
moparisthebest
XMPP only allows a subset of XML
-
Sam
If it's just that you use a parser then you don't really need a corpus except those few things that are forbidden by XMPP, I've got tests for all those things if you want them
-
moparisthebest
I'm sure many projects actually do this, just like many threw XHTML-IM into a DOM
-
moparisthebest
if you caught them all, but yes that would be a good starting point
-
moparisthebest
I think assuming the XML parser you chose actually works well is a mistake
-
moparisthebest
well, I know it's a mistake...
-
flow
to be fair, most XML parser I worked with could be easily modified to reject most things XMPP disallows
-
Sam
I disagree. I mean, you should certainly use a proper XML parser but if you're going to write tests for it you should be upstreaming those, not re-testing what's already been tested
-
Sam
(or what's likely to have already been tested; obviously if you pick an XML parser that's untested that's a problem, I'm just saying that I don't see why you'd retest it in the XMPP library instead of just writing tests for the parser itself)
-
moparisthebest
most XMPP things I see test individual stanzas, and not an XMPP-XML-Stream, and that's a mistake
-
Sam
Why? I mean, I get the need for a test ensuring the parser got limited correctly, but then you can test at the parser level that it correctly rejects comments and the like
-
flow
Sam, re your tests, link pls?✎ -
flow
Sam, re your tests, link pls ✏
-
flow
Sam, re your tests, link pls :) ✏
-
Sam
I'll have to go dig them up, I think there's one or two in internal/stream, or I may not have ever published them. They do not test the stream in the way moparisthebest wants though, I have separate tests that make sure the parser actually gets wrapped in the "XMPP valid stuff only" wrapper
-
Sam
But I have a meeting starting in a few minutes, I'll see if I can't find them afterwards.
-
flow
no worries. I think that also nicely demonstrates the value of a xmpp stream corpus: being able to point people to a repo where they will find plain text files and telling them: your implementation should be able to parse the valid-stream files, and reject the invalid-stream files
-
flow
whereas what we have right now are probably mostly tests, written in the programming-languages native (unit-)test framework, where you have to carefully extra the test vectors if you want to re-use them✎ -
flow
whereas what we have right now are probably mostly tests, written in the programming-languages native (unit-)test framework, where you have to carefully extract the test vectors if you want to re-use them ✏
-
moparisthebest
yep!
-
moparisthebest
probably want to include the location where the invalid ones become invalid, maybe byte index of the last successfully-parsed stanza or something?
-
mathieui
FWIW slixmpp/sleekxmpp has raw stanzas in the unit test suites, but that’s in part because it allows to check that our generated objects are valid, and also it mostly allows copypasting from XEP examples :p
-
mathieui
(so, not too much value as a parser test)
-
moparisthebest
you certainly want both types of tests
-
flow
moparisthebest, for the start i'd probably go with a simple test comment stating where the test is expected to "fail"
-
flow
but yes, if your parser provides you with the exact coordinates where something went wrong, it can not hurt to compare those with the expected values