XSF Discussion - 2021-04-16

  1. Sam

    Who else is a treasurer or treasurer adjacent that should have access to the Open Collective? I assume board people? All of them or just some? Anyone else?

  2. Kev

    Just Peter, probably.

  3. Sam

    I vaguely feel like there should be more than one person with access to reduce bus factor, especially when it comes to things that handle money, but whatever the board wants I suppose.

  4. Sam

    We should also consider who is allowed to use the XSF as a fiscal host and how we decide. My instinct is "anything XMPP related" and "at boards discretion" but it would be good to get that confirmed by board and have the treasurer or someone else bring any new applicants before board each week (I am happy to do this if peter doesn't since it's just forwarding names along, I just want to make sure the board is okay with all this since it involves money and I don't want to just make a bunch of stuff up and hope it's fine)

  5. jonas’

    something about a CoC

  6. Sam

    This is a little bit different, but also making people agree to follow the CoC once we have one if they want to use us as a fiscal host seems reasonable. I'll draft some text and send it to the board email for discussion. I think it will be relatively non-controversial and we can always change it at any time.

  7. Zash

    How do we determine which pieces of software goes on the software listings? Probably some overlap with that selection method.

  8. Sam

    In case anyone wants to brain storm: https://pad.disroot.org/p/XSF_Fiscal_Host_Rules

  9. Sam

    huh, TIL: "Jabber Open Source License" https://opensource.org/licenses/jabberpl

  10. Sam

    I'm assuming that was an early jabberd thing. Glad that got retired.

  11. moparisthebest

    dwd, flow, if you have a spare moment you could see if you are less horrified by https://github.com/moparisthebest/xmpp-proxy/blob/master/src/stanzafilter.rs#L224 (and thanks for the state machine hint flow !) again the point is to NOT have a full on XML parser, but simply to reliably split on stanza boundaries so complete stanzas can be passed to a real XML parser later

  12. moparisthebest

    on a related note, is anyone aware of some comprehensive XMPP XML stream tests anywhere?

  13. Sam

    moparisthebest: what sort of tests are you looking for? More stuff like the ones you linked for splitting XML, or something that matches XML streams to a big jabber;client schema or something?

  14. moparisthebest

    I only need to test splitting XML stanzas out of a stream, so strange formatting, CDATA, processing instructions, comments, really anything that might trip such a thing up

  15. moparisthebest

    in the end, probably need to investigate creating some type of XMPP XML stream fuzzer, but in the short term I was hoping to steal some test cases from existing projects

  16. Zash

    `<x><![CDATA[ lol</x> ]]></x>`

  17. dwd

    moparisthebest, I'm wondering if you maybe *do* need an XML parser, but a decent fast one. I used rapidxml (or at least a fork of it) in Metre, which worked really well, and stood up to AFL very well.

  18. moparisthebest

    Zash, handles that one fine thanks

  19. moparisthebest

    added it to the test

  20. moparisthebest

    I just want to split on stanza boundaries, I do not want to allocate memory to parse anything

  21. dwd

    moparisthebest, Sure, but rapidxml dopesn't allocate anything either.

  22. dwd

    moparisthebest, And you're getting achingly close to an XML parser there anyway.

  23. moparisthebest

    http://rapidxml.sourceforge.net/manual.html#namespacerapidxml_1memory_allocation ?

  24. dwd

    moparisthebest, And yet, <x a='/>'>This is going to be fun.</x>

  25. dwd

    moparisthebest, Yeah, there's a pool for attributes, but since it's a pool it's a single allocation. If you ported it you could *probably* ditch that for the kind of "chopping out elements" work you're trying to do.

  26. Sam

    I really need something like this in Go too. I try to keep Mellium relatively fast, but the XML parser is *terrible* and there's not much point to me optimizing things when we're using a parser as slow as the one we're using

  27. dwd

    moparisthebest, <a a='![CDATA['/> might be fun too.

  28. Sam

    moparisthebest: you might consider fuzzing this. XML is flexible enough that I don't think you'll come close trying to think up edge cases yourself.

  29. Zash

    Probably easier to find a generic fuzzer and let it figure out XML syntax anyway

  30. dwd

    moparisthebest, What Sam says, besides you'll just be writing the same asusmptions into your tests you've been coding for, like all of us do.

  31. dwd

    Zash, AFL can do this, indeed.

  32. dwd

    Zash, Dunno if it'll work with Rust, but ... maybe?

  33. Zash

    AFL-RIIR is probably a thing already

  34. Sam

    Is it worth tying fiscal sponsorship to membership (or saying that at least one person in your project must begin seeking membership)? I don't know if it matters, just seems like something organizations do. That way you've already accepted whatever CoC and other rules we come up with.

  35. Sam

    Maybe not, that seems super limiting.

  36. Sam


  37. Zash

    Seems sensible. (needing membership.)

  38. Sam

    But also, why?

  39. Zash

    Dunno. Why not? Dunno to that too.

  40. dwd

    I don't actually know if that's sensible or not.

  41. Sam

    It would be nice to have a representative from every project, but also if this is a service to the community then maybe we want to make it as easy and open as possible.

  42. Sam

    Not something we have to decide immediately, I'm just thinking about what a policy write up would look like.

  43. moparisthebest

    `<a a='![CDATA['/>` works fine, but indeed I hadn't planned for `<x a='/>'>This is going to be fun.</x>` which will require a "InAttribute" state, thanks dwd

  44. jonas’

    afl works with anything (but is less efficient and less effective) if you run it in qemu mode :)

  45. dwd

    moparisthebest, Right - it *feels* like you're basically writing an XML lexer, if not a parser. THough I'll be honest and say this is one of those cases where my lack of a CS degree means I don't really know the difference.

  46. moparisthebest

    Metre can't proxy c2s right?

  47. dwd

    moparisthebest, Nope.

  48. dwd

    moparisthebest, And it only "truly" proxies S2S with the server's consent, as it were.

  49. jonas’

    moparisthebest, right, what dwd says -- check out parser generators and let one of them build a lexer for you based on the official XML grammar

  50. jonas’

    that won’t allocate a lot if anything at all, depending on the implementation

  51. Kev

    I’d probably *not* be inclined to encourage membership for the sake of the sponsorship stuff. On the basis that the XSF doesn’t benefit from having lots of members, only from having members who are sufficiently motivated/able to do the few teams that need membership, and otherwise to be on top of things enough to make judgements on Council/Board positions based on people’s interactions with the community. Encouraging people to become members purely to get access to money doesn’t really help with that.

  52. Zash

    Kev, good point.

  53. dwd


  54. Sam

    *nods* good point

  55. moparisthebest

    I guess fuzzing won't really do what I need, I need a stream of XMPP XML and to verify I split it at the correct boundaries

  56. moparisthebest

    no one knows of projects that have tests consisting of anything like that for their parsers ?

  57. Kev

    Fuzzing is what you need in terms of testing you don’t fall apart in the face of bad input, but not in terms of ensuring boundaries are correct, indeed.

  58. moparisthebest

    yea fuzzing is certainly valuable, just also need other things

  59. Sam

    Maybe more of a mix of fuzzing and integration testing then. Generate random XML input, pipe it through your splitter and a real parser, when you detect a difference generate a unit test from that.

  60. moparisthebest

    generating random-but-valid XMPP-subset-of-XML sounds hard

  61. Sam

    Not really. Elements, random cdata, random attributes.

  62. moparisthebest

    yea, but then we are back to testing only the things I know about

  63. Kev

    Yeah. It’s easy as long as you don’t want anything that you didn’t already think of and could have generated manually :D

  64. moparisthebest

    essentially yea :)

  65. Kev

    I, once upon a time, wrote an XML-aware (and fairly naive) fuzzing layer for Swiften that would modify stanzas on the way out randomly so we could run ‘good’ Sluift scripts against M-Link and have them modified in malicious ways.

  66. Kev

    That was in the days before AFL These days you’d run the same scripts to generate a corpus to feed into a branch-aware fuzzer instead, presumably.

  67. Sam

    No, because you have random attributes and the like

  68. moparisthebest

    "random" but also that follows the rules I know about like cannot contain " or ' , but those are the rules I know, and have implemented already

  69. moparisthebest

    basically for each chunk my splitter spits out, when fed into a real XML parser, it should either: 1. parse a complete stanza 2. error out because of invalid xml (mis-matched tags etc etc) the thing it should never do is: 3. wait for the rest of a partial stanza

  70. flow

    moparisthebest, no, jxmpp has a corpus of valid and invalid JIDs, but no corpus of valid and invalid XMPP streams. Wanna team up? :)

  71. Kev

    FWIW, I would be inclined to use an XML library for this, unless and until you can see that the performance through that is inadequate.

  72. moparisthebest

    even my super naive and known-wrong initial splitter worked perfectly fine with normal-case XMPP, I ran it for days filtering 100% of XML into my server without any errors, it's the other cases that need work

  73. moparisthebest

    I'm after zero-memory-allocations rather than performance

  74. Kev

    TBH, if you’re worried about ‘working’ rather than ‘correct’, a few days of data on an active and well-peered server usually catches most edge cases, in my experience.

  75. Sam

    I guess I don't get what you're trying to test for then. Running random inputs against a real XML parser and your thing seems like it would identify unknown areas where splitting is broken.

  76. moparisthebest

    yes I think it'd be valuable, just not as valuable as the horrors-people-have-seen-in-the-wild and added test cases for, but if those don't exist...

  77. Kev

    Run AFL against libxml2, generate a corpus, feed that in?

  78. Sam

    Yah, I don't know that you'll do well finding specific things from people to test. This is too general for that.

  79. moparisthebest

    now that's an interesting thought Kev ...

  80. Sam

    Isn't that what I said except recommending specific tools? I am not understanding something about what's being tested here I guess.

  81. Kev

    It may be what you said, but not what I read :)

  82. Kev

    (Which is probably on me)

  83. moparisthebest

    Sam, mainly, if I write a tool to generate all possible XML as I understand it, I might miss something valid that I don't know about

  84. moparisthebest

    vs, fuzzing, in theory, should eventually hit all cases ?

  85. Sam

    Fuzzing is literally what I said, but yah, I didn't mean "write your own thing". Anyways, what Kev said is what I was suggesting. Do that, it will be better than asking for samples which will never catch the one weird edge case.

  86. Sam

    My apologies if I wasn't clear.

  87. moparisthebest

    no my bad I appreciate it

  88. moparisthebest

    flow, sure, but maybe this is a good path forward already, convince a fuzzer to generate individually good stanzas, combine them in random orders for good streams ? :/

  89. flow

    moparisthebest, not saying that it's not, just that a curated corpus would be also nice

  90. moparisthebest

    flow, I agree, got any thoughts on gathering that together? :)

  91. Sam

    What is this corpus for specifically?

  92. moparisthebest

    testing XMPP XML stream parsing ?

  93. Sam

    Just where to split XML tokens?

  94. Sam

    I'm just wondering how many people actually do their own XML parsing.

  95. flow

    I was thinking of a corpus of valid and invalid XMPP streams

  96. moparisthebest

    my thing is only concerned on where to split stanzas out of an XML stream, but such a corpus would be more generally useful

  97. Sam

    I just don't understand what that tests unless you wrote your own parser

  98. flow

    entries in the valid corpus would contain the stream and the indivudual elements that the splitter should identify

  99. moparisthebest

    and how many wrongly-use generic XML parsers and allow comments, processing instructions, etc etc Sam ?

  100. flow

    and entries in the invalid corpus should be just rejected

  101. moparisthebest

    XMPP only allows a subset of XML

  102. Sam

    If it's just that you use a parser then you don't really need a corpus except those few things that are forbidden by XMPP, I've got tests for all those things if you want them

  103. moparisthebest

    I'm sure many projects actually do this, just like many threw XHTML-IM into a DOM

  104. moparisthebest

    if you caught them all, but yes that would be a good starting point

  105. moparisthebest

    I think assuming the XML parser you chose actually works well is a mistake

  106. moparisthebest

    well, I know it's a mistake...

  107. flow

    to be fair, most XML parser I worked with could be easily modified to reject most things XMPP disallows

  108. Sam

    I disagree. I mean, you should certainly use a proper XML parser but if you're going to write tests for it you should be upstreaming those, not re-testing what's already been tested

  109. Sam

    (or what's likely to have already been tested; obviously if you pick an XML parser that's untested that's a problem, I'm just saying that I don't see why you'd retest it in the XMPP library instead of just writing tests for the parser itself)

  110. moparisthebest

    most XMPP things I see test individual stanzas, and not an XMPP-XML-Stream, and that's a mistake

  111. Sam

    Why? I mean, I get the need for a test ensuring the parser got limited correctly, but then you can test at the parser level that it correctly rejects comments and the like

  112. flow

    Sam, re your tests, link pls?

  113. flow

    Sam, re your tests, link pls

  114. flow

    Sam, re your tests, link pls :)

  115. Sam

    I'll have to go dig them up, I think there's one or two in internal/stream, or I may not have ever published them. They do not test the stream in the way moparisthebest wants though, I have separate tests that make sure the parser actually gets wrapped in the "XMPP valid stuff only" wrapper

  116. Sam

    But I have a meeting starting in a few minutes, I'll see if I can't find them afterwards.

  117. flow

    no worries. I think that also nicely demonstrates the value of a xmpp stream corpus: being able to point people to a repo where they will find plain text files and telling them: your implementation should be able to parse the valid-stream files, and reject the invalid-stream files

  118. flow

    whereas what we have right now are probably mostly tests, written in the programming-languages native (unit-)test framework, where you have to carefully extra the test vectors if you want to re-use them

  119. flow

    whereas what we have right now are probably mostly tests, written in the programming-languages native (unit-)test framework, where you have to carefully extract the test vectors if you want to re-use them

  120. moparisthebest


  121. moparisthebest

    probably want to include the location where the invalid ones become invalid, maybe byte index of the last successfully-parsed stanza or something?

  122. mathieui

    FWIW slixmpp/sleekxmpp has raw stanzas in the unit test suites, but that’s in part because it allows to check that our generated objects are valid, and also it mostly allows copypasting from XEP examples :p

  123. mathieui

    (so, not too much value as a parser test)

  124. moparisthebest

    you certainly want both types of tests

  125. flow

    moparisthebest, for the start i'd probably go with a simple test comment stating where the test is expected to "fail"

  126. flow

    but yes, if your parser provides you with the exact coordinates where something went wrong, it can not hurt to compare those with the expected values