Is there any testing harness (big word) for JID validation?
sponjihas left
larmahas left
sonnyhas joined
larmahas joined
MattJ
Nothing universal. Prosody's tests are at https://hg.prosody.im/trunk/file/tip/spec/util_jid_spec.lua and it's been quite a while since anyone found bugs in it...
edhelas
Wondering how we can have test suites accros languages
MattJ
Defined input and output formats, not unheard of
edhelas
I'm currently looking to have a proper RFC 6122 support in PHP
rubihas left
rubihas joined
sponjihas joined
nikhas joined
Dele Olajidehas left
dele.olajidehas joined
sponjihas left
rubihas left
rubihas joined
Zash
pep., MattJ: nothing universal and every time it is brought up someone NIHs a new format
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
MattJ
Might as well just publish some CSV or whatever with inputs and expected results, something boringly simple and easy to parse - then people can figure out how to get that into whatever framework they use
sponjihas left
sponjihas joined
sponjihas left
Zash
MattJ: I might even have started but got bored when yet another custom format was proposed instead
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
pep.
"," isn't valid in any JID part?
pep.
I'd have two files, so that isn't an issue
Kev
I thought it was valid in resources, but haven't checked.
Link Mauvehas left
homebeachhas left
Matrix Traveler (bot)has left
homebeachhas joined
Matrix Traveler (bot)has joined
Link,Mauve
It is valid in resources.
Link,Mauvehas left
nicolahas left
pep.
You demonstrated at best it's valid in Prosody MUCs :P
lovetoxhas left
singpolymahas left
lovetoxhas joined
edhelas
Movim accepts it as well, as there is no proper validation at all :p
nicoco
what would be super great for me is a "random UTF8 junk to valid resource part" converter. some legacy service allow control chars it seems. Or maybe it's slixmpp that is not permissive enough? Is "×͜× " a valid resource?
Zash
nicoco, resourceprep says "maybe"
Link Mauve
nicoco, it isn’t.
Link Mauve
(I just tried, and poezio showed no error… Will fix.)
Link Mauve
nicoco, you might get some issue with multiple “random UTF-8 junk” mapping to the same resourcepart.
nicoco
my "nickname cleaner" right now is `"".join(x for x in nickname if x in string.printable) + " [renamed by slidge]"`, but for this nickname for instance, it turns it to " [renamed by slidge]", which is not very satisfying
Link Mauve
For instance, Link Mauve and Link Mauve do map to the same resource after resourceprep.
Beherit
(In the xsf muc I suggested to discuss whether we should consider Google Season of Docs for writing XEPs.
https://developers.google.com/season-of-docs/docs/get-started?hl=en
xmpp:xsf@muc.xmpp.org?join
)
nicoco
and also, yes Link Mauve, you're right, I'm risking collisions
Link Mauve
If the legacy protocol you map to considers those two different users, here be dragons.
nicoco
well, they are 2 different users, but in MUCs, even non anonymous, the nickname is supposedly also a unique identifier
thomaslewishas joined
nicoco
actually I was dropping by to ask whether it made any sense to use "XEP-0421: Anonymous unique occupant identifiers for MUCs" in *non*anonymous mucs.
sonnyhas left
nicolahas joined
PapaTutuWawahas left
Zash
nicoco, where everyone sees real JIDs? not much value in it, but I suppose it doesn't hurt for the server to do it anyway
homebeachhas left
Matrix Traveler (bot)has left
homebeachhas joined
Matrix Traveler (bot)has joined
singpolymahas joined
singpolyma
> "," isn't valid in any JID part?
Why wouldn't it be? Almost everything is allowed in localpart
nicoco
in fact, AFAIU, none of the legacy services I map right now even have the concept of anonymous groups. but I suspect some clients are not going to allow retractions without it anyway. I'd rather avoid adding it if it doesn't make sense. more code = more trouble. ^^
Zash
reference to the C in CSV? it has quoting, so not a problem
MattJ
I almost wrote TSV (which I prefer), but it's less of a standard
PapaTutuWawahas joined
MattJ
CSV is a standard because there are so many variants to choose from
sonnyhas joined
nicoco
is that the definition of a standard? something that you have many variants to choose from?
Zash
I think I'd just cry a tear and suggest JSON
MattJ
I almost suggested JSON, but that's not ideal if you want to test various unicode things
Zash
Is that how ... was it flow? ended up with some newline-based custom format?
singpolyma
JSON is required to be utf8, no? So Unicode should be no issue
Zash
singpolyma, no, it's UTF-16
MattJ
No, and the Prosody tests include invalid UTF-8
singpolyma
Zash: I think you're talking about the escape sequences?
MattJ
So anything required to be valid unicode is not suitable, unless you add additional encoding
singpolyma
UTF16 encoded json is not a thing
Zash
Sure, escapes that use surrogate pairs and stuff. Depending on your JSON library.
Zash
Lua is obviously the best format :)
Zash
Binary safe, descendant of an actual data description format :)
singpolyma
MattJ: ah, so the tests assume utf8 decoder but contain raw binary?
singpolyma
What we did for the dhall tests was folders with input in one file and output in another
singpolyma
No format, just use the filesystem
thomaslewishas left
MattJ
Prosody's tests don't assume anything, as Lua isn't unicode-aware so you can put most things literally in strings. That's not always sensible due to editors and stuff, though.
MattJ
So for some things we apply hex or base64
Zash
Soooooooooooooo we're doing another round of format bikeshedding?
I'm just here because someone said commas might not be allowed in jid and I freaked out ;)
pep.
"Zash> Is that how ... was it flow? ended up with some newline-based custom format?" yeah fwiw I would do one line per entry, on two separate files. Not much parsing required, very little chance for confusion..
pep.
singpolyma, not what I said, I asked indeed because of CSV
Zash
pep., but how do you test jids with newlines???
pep.
Is that a thing?
Zash
no, but how do you verify that your library rejects it? :)
moparisthebest, I proposed that. And that means you can't test newlines in your jids
TheCoffeMakerhas left
sponjihas left
moparisthebest
JIDs shouldn't have newlines ;)
singpolymahas joined
sponjihas joined
moparisthebest
But fine, seperate lines with \0
sponjihas left
sponjihas joined
sponjihas left
singpolyma
Or just use one file per input one per output and you don't need a format at all. So many options!
sponjihas joined
nikhas left
sponjihas left
sponjihas joined
edhelashas left
sponjihas left
sonnyhas left
sponjihas joined
sponjihas left
edhelashas joined
TheCoffeMakerhas joined
sponjihas joined
adxhas left
nikhas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sonnyhas joined
goffihas left
goffihas joined
goffihas left
goffihas joined
MattJ
It's not just "valid" or "invalid" though - Prosody's tests extensively test correct splitting, which many clients/libraries have got wrong in the past
larmahas joined
uhas left
pep.
splitting? the 3 parts?
singpolyma
Rather than standard tests I'd rather see well tested libraries, ideally.
singpolyma
Most libraries right now accept almost any random crap in their jid "parser"
Zash
Isn't the goal here to have common test data and test all the libraries at the same time?
uhas joined
jgarthas joined
adxhas joined
goffihas left
MattJ
Exactly. That's part of the reason libraries aren't sufficiently tested... because without shared test cases, every project just writes their own and inevitably misses some
MattJ
If they write any at all
goffihas joined
heartyhas left
Zash
Something like what exists for JSON and Markdown libraries ... but of course I couldn't those sites now
flow
MattJ, fwiw, the valid jids are tested for proper splitting
MattJ
in Smack?
flow
no in jxmpp (which is used by Smack)
dele.olajidehas left
MattJ
Right, sure
MattJ
I'm not commenting on any individual implementation
pep.
I guess flow meant there is not need to test splitting separately?
MattJ
Just saying I agree that a common set of test cases would be beneficial
flow
ack
MattJ
I brought up splitting because having a list of "invalid JIDs" and a list of "valid JIDs" is not sufficient for testing a JID parser
pep.
Anyway I'm happy with whatever people have. Maybe label tests so that they can be run separately?
MattJ
and that's one of the solutions that was proposed
flow
ahh ok, the list of valid JIDs in jxmpp corpus also consists of the expected splitted parts
MattJ
Exactly. In what format? :)
Dele Olajidehas joined
flow
the grammar is defined in https://github.com/igniterealtime/jxmpp/blob/master/jxmpp-strings-testframework/src/main/resources/xmpp-strings/jids/valid/main#L13-L20
singpolymahas left
flow
basically using control chars to separate the parts
singpolymahas joined
pep.
I guess that's easily convertible to another format anyway?
flow
which yields the nice property that it's still a simple text file that can hold the corpus, while you do not need to escape antyhing
flow
sure, transformations are possible, but I wonder if there is a better format. but I am happy to hear the ideas
flow
the format jxmpp's jid corpus uses is trivially parsable
pep.
Is that the corpus?
pep.
Or just an example
flow
the two files are the currently existing corpus
pep.
I see
flow
I have some invalid JIDs scraped from openfire (courtesty of Guus) that I need to add
flow
but since every JID is checked with 4 different "stringprep" implementations, it is a bit of work to add them to the corpus. because you first have to play protocol laywer and decide if its a valid jid or not, and then mask the non-conforming implementations
pep.
That doesn't seem to hard to use in xmpp-rs/jid
flow
it shouldn't be, I am surse there is a decent PEG parser for rust✎
pep.
yeah yeah
flow
it shouldn't be, I am sure there is a decent PEG parser for rust ✏
pep.
it's twice faster than nom.
pep.
(Sorry, private joke on #rust-fr)
oshnhas left
sponjihas left
archiwistkaapokalipsyhas left
archiwistkaapokalipsyhas joined
PeterWhas joined
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
flow
pff, inside jokes :)
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
goffihas left
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sonnyhas left
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
PapaTutuWawahas left
goffihas joined
sponjihas joined
pep.
It's because pest, a peg parser in Rust, had a graph on their web page a while back showing how better it was than other Rust parsers, and it was totally bonkers. #Benchmarks
pep.
And the nom dev is a regular in #rust-fr so that was the joke
sponjihas left
PeterWhas left
sponjihas joined
sponjihas left
sonnyhas joined
sponjihas joined
PapaTutuWawahas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
flow
I would be happy of the jid corpus had a size where parsing speed would be of consideration :)
sponjihas joined
sponjihas left
oshnhas joined
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
PeterWhas joined
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
Dele Olajidehas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
Menelhas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
Mx2has joined
antranigvhas left
antranigvhas joined
sponjihas left
sponjihas joined
sponjihas left
Mx2has left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
norayrhas left
sponjihas left
sponjihas joined
sponjihas left
sponjihas joined
nikhas left
sponjihas left
heartyhas joined
sponjihas joined
antranigvhas left
sponjihas left
sponjihas joined
sponjihas left
PeterWhas left
sponjihas joined
sponjihas left
sponjihas joined
antranigvhas joined
Wojtekhas left
pep.
flow, is this correct?
Corpus → Entry*
Entry → Jid* | CommentLine*
snowhas joined
pep.
Shouldn't Entry be Jid | CommentLine ? (without the *)