HolgerMattJ: Thanks. This says you "might be able to determine if the client [...] is online" if you know the resource string. There's no explanation how you would determine that, right?
MattJNot anything explicit that I know of
HolgerMattJ: I'm aware of a few ways to do that at least with ejabberd users, but each of those are spec or implementation issues which IMO should be fixed either way (rather than only hidden by making the resource string unpredictable).
MattJBut this is implicitly assumed by various XEPs
Ge0rGHolger: the only issue I'm aware of is periodic sending of IQs to a full JID, to probe the client availability and network latency
HolgerMattJ: Unpredictable resource strings are assumed?
Ge0rGIt's also a really poor idea to use UUIDs for randomness if we have the full power of unicode (or at least base32 / base64)
MattJI'd wager that yes, numerous XEPs rely on unpredictable resource strings for security/privacy purposes
Ge0rGYouTube, the world's most used video platform, manage to identify videos by less than a dozen characters. Why do we need to use 36 characters to identify two clients on an account?
KevIf there's a standard for how to do something equivalent to UUID (global uniqueness without fingerprinting) in fewer bytes, using that instead of UUID seems entirely sensible.
Ge0rGAre we protecting from enumeration attacks? From birthday attacks? What is the actual amount of randomness required in a resource to prevent those?
HolgerKev: Why do we need a standard for that? I mean there's no interop requirement?
KevHolger: Well, we'd at least want it to be consistently implemented across clients, else you can fingerprint (which probably isn't the end of the world, but seems like something we'd want to prevent).
Ge0rGKev: sticking to default implementations is what causes user-visible URLs like https://xmpp.yaxim.org:5281/upload/54f59abf-de9b-4fb9-a1e4-f0b5b78a0a9d/a5f532df-0fef-47c4-81a9-2aff690419fc.jpg
intosiHolger: libraries have the benefit of people not writing their own random identifier generator.
intosi* standard methods present in libraries, I mean.
jonaswgetentropy(nbits=32) | base64 | strip('=') should be possible with every standard library.
jonaswwe are requiring PRECIS which has *much less* support than that
intosiBut yeah, another well-defined method could work equally well.
intosiWhen unicode comes into play, I'm less inclined to believe that every client gets it right, though. Unicode is hard.
jonaswbase64 is luckily only ascii
Ge0rGwhat we are looking for is a number of random bits packed into as few characters as possible for a given element
KevWell, not quite.
KevWe're also looking for it to be typable by server admins grepping for logs, etc.
KevAnd visibly distinguishable.
KevSimply picking random bits forced into UTF-8 clearly wouldn't work, for example.
Ge0rGKev: uuids are not very visibly distinguishable.
KevIf you give me two UUIDs, I'm fairly sure I can tell you if they're the same.
jonaswKev: agreed (with random bits in utf-8), not only because of distinguiushability, but because it also needs to pass resourceprep and/or precis unmodified. that’s nontrivial.
Ge0rGKev: what if you have a complex scenario with N clients in a MUC, a set of M reflected messages with rewritten UUID ids? For which values of N and M can you keep N+2*M UUIDs in your short-term memory?
ZashDisco identity ?
jonaswwhat about dictionary-based strings?
jonaswjust sample from /usr/share/dict/british-english-insane (it’s a thing!)
HolgerZash: There's no disco identity when staring at XEP examples ...
Ge0rGbase64-strings have better visible distinguishability due to more uppercase/lowercase.
KevGe0rG: In what scenario are you imagining someone debugging such a thing and using the resources as the mental identifiers?
KevAs I said earlier, if we can more tightly pack than UUID, while still maintaining human readability, great.
Kev(And the other desirable properties of UUIDs)
intosiBasically still gibberish. I'm not convinced it would make a difference, I would suck at remembering many of either UUID, base64, or picks-from-a-wordlist-with-sufficient-entropy.
intosiFor all of them, I would just use the first four or five chars anyway.
Ge0rGKev: which identifiers would you use if all you have at hand are debug logs?
HolgerKev: I'm still quite baffled you'd list 'readability' as one of the properties of UUIDs :-)
KevNicks in a MUC, in the usual case of them being pretty static.
Holger(Then again these XMPP people are used to reading XML ...)
KevHolger: They're not *nice*, but it is possible to straightforwardly read them.
HolgerKev: But what makes them any more readable than e.g. Base64?
Ge0rGKev: but I want it *nice*.
KevCompared with random UTF-8, which is impossible to read because you've no idea which of the many matching characters you're looking at.
HolgerOh. Well yeah there's always something worse.
jonaswKev: nobody was seriously talking about using random utf-8
Kevjonasw: You might not be taking Ge0rG seriously, but I was still trying to address his point that we should be using UTF.
jonaswKev: I’m pretty sure Ge0rG was only making a point that using only hex is a waste of bytes and we should use base64 or something.
Kevjonasw: Given he explicitly said we should use the power of UTF-8, I don't think so :)
jonasw"the power of unicode", right…
KevYou're right, he said unicode, not UTF-8.
Ge0rGKev: actually, I should have written "a number of random bits packed into as few bytes as possible" - that should rule out non-ascii
KevYou get denser packing with UTF-8 than with ASCII encoded as UTF-8.
intosiWell, that emoji-encoder uses four bytes per byte ;)
jonaswKev: I’m not sure about that. with ascii, you have a constant 7 bits / byte, with UTF-8 you lose bits for codepoints above 127 for signalling
Ge0rGintuitively, I'm with jonasw here.
KevI might be wrong, conceivably. It doesn't match my mental mapping of UTF-8, but I know that's a poor mapping.
jonaswin any case, that’s not the point of the discussion.
KevI think you get to use the extra bit as encoding for at least the first byte.
Ge0rGnobody has still said what attack we are trying to prevent, and how many bits of randomness are required to guard it off
jonasw(using any fancy unicode would be very hard, as I said, with the mappings done by resourceprep et al anyways)
KevThe important thing isn't that these are UUIDs, it's that they have the properties of UUIDs and are consistently implemented.
Ge0rGin a world where clients leak their presence like crazy, having 256 bits of randomness in the resource might be a solution to the wrong problem.
Kevgoes back to work.
Piotr Nosekhas joined
Steve Killehas joined
Steve Killehas left
ZashXEP-0198 doesn't define what should happen when a session times out, or am I missing something?
Steve Killehas left
Piotr Nosekhas left
Piotr Nosekhas joined
ZashGe0rG: Btw, for maximum entropy per byte with base64, get a multiple of 3 bytes. You get a multiple of 4 bytes out and no padding.
Ge0rGZash: or you just strip away the padding
ZashGe0rG: There may still encoded 0 bits
ZashGe0rG: There may still be encoded 0 bits
Ge0rGZash: but how does that relate to my original statement?