-
Link Mauve
In a JID we have that 1023 bytes limit for each component, but is that before or after normalisation?
-
Link Mauve
Say I have a JID of the form ™™™™™™™…@ (with a count of ‘™’ comprised between 341 and 511), should it get accepted by other entities or rejected on the basis it is too long?
-
Link Mauve
™ gets normalised to tm, and the former is 3 bytes long in UTF-8, the latter 2 bytes long.
-
larma
As always: Be strictest on data you create and lax on what others did: Do not allow to sign in or register with the >341 ™ in a component, but accept the <512 ™ when received.
-
Link Mauve
Yeah, that makes sense.
-
Link Mauve
Are there codepoints which go the other way, grow in byte length when normalised?
-
Zash
Prosody uses char[1024] (including NUL trailer) buffers for both input and output, so would reject both cases of oversized JID component.
-
Zash
There's only 0x10FFFF code points, so you could check all of them fairly quickly ;)
-
Link Mauve
Indeed.
-
larma
Link Mauve, I think composed chars will typically increase in size when normalized✎ -
larma
Link Mauve, I think chars that can be expressed as composed chars will typically increase in size when normalized ✏
-
larma
Like á (0xC3A1) turns into á (0x61CC81)
-
larma
are we talking about NFC or NFD though?
-
Zash
"yes"
-
Link Mauve
Ah no, I was talking about stringprep.
-
larma
Ah, stringprep only increases IIRC
-
Link Mauve
Only decreases right?
-
Link Mauve
Like in that ™ → tm example.
-
larma
uhm, I was thinking of number of codepoints. Not sure about utf-8 encoding then
-
Link Mauve
Oh.
-
larma
maybe i'm mixing things up though
-
larma
Try with ǰ (\u01F0 = 0xC7B0), I think it should become ǰ (\u006A\u030C = 0x6ACC8C)
-
Zash
I observe that it recomposes 0061 0301 → 00E1 (a+´ → á)
-
larma
The issue with stringprep is that it does case mapping, and some chars have different codepoint requirements in different cases
-
larma
ΐ \u0390 is case mapped into ΐ \u03B9\u0308\u0301
-
flow
Link Mauve, would it make sense if it was before normalization? Then a JID would be somtimes valid and somtimes not, a situation that doesn't appear to be much appealing
-
flow
fwiw, jxmpp applies enforces the 1023 bytes limit on the resulting normalized string encoded in utf-8✎ -
flow
fwiw, jxmpp enforces the 1023 bytes limit on the resulting normalized string encoded in utf-8 ✏
-
Zash
but can you pass a >1023 octet string into it?
-
rom1dep
hello there, do you happen to know if there's a ranking of clients by popularity? Something that a scraper like muclumbus could aggregate and report about. I'm being told that my perception about a certain client being close to extinct is wrong, and it not supporting recent XEPs is a proof that XMPP is full of obsolete clients
-
Link Mauve
rom1dep, hi, it’s for a single server but you can find our stats here: https://stats.jabberfr.org/
-
Link Mauve
Check Client identities there.
-
rom1dep
Link Mauve: super interesting, thanks!
-
Link Mauve
rom1dep, note also that it is a percentage of the total number of current sessions, some people use multiple clients so the total percentage should probably be higher than 100%.
-
Link Mauve
But currently it’s only counting sessions.
-
Zash
So no clients like Siskin that only stays connected while it's in the foreground...
-
Link Mauve
Right.
-
flow
Zash, not sure if I understand the question? strings that are longer that 1023 bytes/octets after normaliztion when encoded in UTF-8 will be rejected
-
Zash
flow, but before?
-
flow
Zash, yes, I can't rule out that the string before is longer than 1023 bytes
-
flow
but that is true in general with xmpp string validation: the input string may not be valid