There seems to be a consensus that predictable resource strings introduce a privacy issue. Do we have any text in any XEP (or on the wiki or the mailing list) explaining the issue?
MattJ: Thanks. This says you "might be able to determine if the client [...] is online" if you know the resource string. There's no explanation how you would determine that, right?
MattJ
Not anything explicit that I know of
Holger
MattJ: I'm aware of a few ways to do that at least with ejabberd users, but each of those are spec or implementation issues which IMO should be fixed either way (rather than only hidden by making the resource string unpredictable).
Ge0rG
Holger: +1
MattJ
But this is implicitly assumed by various XEPs
Kevhas left
Ge0rG
Holger: the only issue I'm aware of is periodic sending of IQs to a full JID, to probe the client availability and network latency
Holger
MattJ: Unpredictable resource strings are assumed?
xyzhas left
Ge0rG
It's also a really poor idea to use UUIDs for randomness if we have the full power of unicode (or at least base32 / base64)
Holger
Indeed.
MattJ
I'd wager that yes, numerous XEPs rely on unpredictable resource strings for security/privacy purposes
Ge0rG
YouTube, the world's most used video platform, manage to identify videos by less than a dozen characters. Why do we need to use 36 characters to identify two clients on an account?
Kev
If there's a standard for how to do something equivalent to UUID (global uniqueness without fingerprinting) in fewer bytes, using that instead of UUID seems entirely sensible.
Ge0rG
Are we protecting from enumeration attacks? From birthday attacks? What is the actual amount of randomness required in a resource to prevent those?
Holger
Kev: Why do we need a standard for that? I mean there's no interop requirement?
jonasw
w
jonasw
^
Kev
Holger: Well, we'd at least want it to be consistently implemented across clients, else you can fingerprint (which probably isn't the end of the world, but seems like something we'd want to prevent).
Ge0rG
Kev: sticking to default implementations is what causes user-visible URLs like https://xmpp.yaxim.org:5281/upload/54f59abf-de9b-4fb9-a1e4-f0b5b78a0a9d/a5f532df-0fef-47c4-81a9-2aff690419fc.jpg
intosi
Holger: libraries have the benefit of people not writing their own random identifier generator.
intosi
* standard methods present in libraries, I mean.
jonasw
getentropy(nbits=32) | base64 | strip('=') should be possible with every standard library.
jonasw
we are requiring PRECIS which has *much less* support than that
intosi
Which base64?
intosi
But yeah, another well-defined method could work equally well.
When unicode comes into play, I'm less inclined to believe that every client gets it right, though. Unicode is hard.
jonasw
base64 is luckily only ascii
Ge0rG
what we are looking for is a number of random bits packed into as few characters as possible for a given element
Kev
Well, not quite.
jonasw
base-emoji
Zash
Base 85
Kev
We're also looking for it to be typable by server admins grepping for logs, etc.
Kev
And visibly distinguishable.
Kev
Simply picking random bits forced into UTF-8 clearly wouldn't work, for example.
Ge0rG
Kev: uuids are not very visibly distinguishable.
Kev
If you give me two UUIDs, I'm fairly sure I can tell you if they're the same.
jonasw
Kev: agreed (with random bits in utf-8), not only because of distinguiushability, but because it also needs to pass resourceprep and/or precis unmodified. that’s nontrivial.
Ge0rG
Kev: what if you have a complex scenario with N clients in a MUC, a set of M reflected messages with rewritten UUID ids? For which values of N and M can you keep N+2*M UUIDs in your short-term memory?
Zash
Disco identity ?
jonasw
what about dictionary-based strings?
jonasw
just sample from /usr/share/dict/british-english-insane (it’s a thing!)
Holger
Zash: There's no disco identity when staring at XEP examples ...
Ge0rG
base64-strings have better visible distinguishability due to more uppercase/lowercase.
Kev
Ge0rG: In what scenario are you imagining someone debugging such a thing and using the resources as the mental identifiers?
Kev
As I said earlier, if we can more tightly pack than UUID, while still maintaining human readability, great.
Kev
(And the other desirable properties of UUIDs)
intosi
Basically still gibberish. I'm not convinced it would make a difference, I would suck at remembering many of either UUID, base64, or picks-from-a-wordlist-with-sufficient-entropy.
intosi
For all of them, I would just use the first four or five chars anyway.
Ge0rG
Kev: which identifiers would you use if all you have at hand are debug logs?
Holger
Kev: I'm still quite baffled you'd list 'readability' as one of the properties of UUIDs :-)
Kev
Nicks in a MUC, in the usual case of them being pretty static.
Holger
(Then again these XMPP people are used to reading XML ...)
Kev
Holger: They're not *nice*, but it is possible to straightforwardly read them.
Holger
Kev: But what makes them any more readable than e.g. Base64?
Ge0rG
Kev: but I want it *nice*.
Kev
Compared with random UTF-8, which is impossible to read because you've no idea which of the many matching characters you're looking at.
Kev
Holger: Nothing.
Holger
Oh. Well yeah there's always something worse.
jonasw
Kev: nobody was seriously talking about using random utf-8
jonasw
(I hope.)
Kev
jonasw: You might not be taking Ge0rG seriously, but I was still trying to address his point that we should be using UTF.
jonasw
Kev: I’m pretty sure Ge0rG was only making a point that using only hex is a waste of bytes and we should use base64 or something.
Ge0rG
https://gist.github.com/windytan/7910910/
Kev
jonasw: Given he explicitly said we should use the power of UTF-8, I don't think so :)
jonasw
"the power of unicode", right…
Kev
You're right, he said unicode, not UTF-8.
Ge0rG
Kev: actually, I should have written "a number of random bits packed into as few bytes as possible" - that should rule out non-ascii
Kev
Why?
Kev
You get denser packing with UTF-8 than with ASCII encoded as UTF-8.
intosi
Well, that emoji-encoder uses four bytes per byte ;)
jonasw
Kev: I’m not sure about that. with ascii, you have a constant 7 bits / byte, with UTF-8 you lose bits for codepoints above 127 for signalling
Ge0rG
intuitively, I'm with jonasw here.
Kev
I might be wrong, conceivably. It doesn't match my mental mapping of UTF-8, but I know that's a poor mapping.
jonasw
in any case, that’s not the point of the discussion.
Kev
I think you get to use the extra bit as encoding for at least the first byte.
Kev
Sure.
Ge0rG
nobody has still said what attack we are trying to prevent, and how many bits of randomness are required to guard it off
jonasw
(using any fancy unicode would be very hard, as I said, with the mappings done by resourceprep et al anyways)
Kev
The important thing isn't that these are UUIDs, it's that they have the properties of UUIDs and are consistently implemented.
Sonnyhas left
Ge0rG
in a world where clients leak their presence like crazy, having 256 bits of randomness in the resource might be a solution to the wrong problem.
Kevgoes back to work.
kalkinhas left
Sonnyhas left
kalkinhas joined
Guushas left
winfriedhas left
xyzhas left
Viniloxhas left
Zashhas joined
Zashhas left
Zashhas joined
Sonnyhas left
manchohas left
waqashas joined
moparisthebesthas left
SouLhas joined
SouLhas joined
Piotr Nosekhas joined
SouLhas joined
SouLhas joined
SouLhas joined
SouLhas joined
suzyohas left
suzyohas joined
Steve Killehas joined
jubalhhas joined
jubalhhas left
Valerianhas left
Valerianhas joined
xyzhas joined
jcbrandhas left
jcbrandhas left
Kevhas left
Kevhas left
jonaswhas left
Flowhas joined
Flowhas joined
jerehas joined
xyzhas left
Neustradamushas left
daurnimatorhas left
SouLhas joined
manchohas joined
daurnimatorhas left
kalkinhas left
kalkinhas joined
Valerianhas left
Valerianhas joined
mimi89999has left
vurpohas left
vurpohas joined
suzyohas left
suzyohas joined
Steve Killehas left
mimi89999has left
Zash
XEP-0198 doesn't define what should happen when a session times out, or am I missing something?
Kevhas left
vurpohas left
vurpohas joined
manchohas left
xyzhas joined
manchohas joined
Steve Killehas left
xyzhas left
Kevhas left
SamWhitedhas left
Piotr Nosekhas left
Piotr Nosekhas joined
vurpohas left
vurpohas joined
vurpohas left
vurpohas joined
xyzhas joined
sezuanhas left
Valerianhas left
ralphmhas left
xnyhpshas left
jerehas joined
mimi89999has left
kalkinhas left
Zash
Ge0rG: Btw, for maximum entropy per byte with base64, get a multiple of 3 bytes. You get a multiple of 4 bytes out and no padding.