jdev - 2022-09-17


  1. Link Mauve

    In a JID we have that 1023 bytes limit for each component, but is that before or after normalisation?

  2. Link Mauve

    Say I have a JID of the form ™™™™™™™…@ (with a count of ‘™’ comprised between 341 and 511), should it get accepted by other entities or rejected on the basis it is too long?

  3. Link Mauve

    ™ gets normalised to tm, and the former is 3 bytes long in UTF-8, the latter 2 bytes long.

  4. larma

    As always: Be strictest on data you create and lax on what others did: Do not allow to sign in or register with the >341 ™ in a component, but accept the <512 ™ when received.

  5. Link Mauve

    Yeah, that makes sense.

  6. Link Mauve

    Are there codepoints which go the other way, grow in byte length when normalised?

  7. Zash

    Prosody uses char[1024] (including NUL trailer) buffers for both input and output, so would reject both cases of oversized JID component.

  8. Zash

    There's only 0x10FFFF code points, so you could check all of them fairly quickly ;)

  9. Link Mauve

    Indeed.

  10. larma

    Link Mauve, I think composed chars will typically increase in size when normalized

  11. larma

    Link Mauve, I think chars that can be expressed as composed chars will typically increase in size when normalized

  12. larma

    Like á (0xC3A1) turns into á (0x61CC81)

  13. larma

    are we talking about NFC or NFD though?

  14. Zash

    "yes"

  15. Link Mauve

    Ah no, I was talking about stringprep.

  16. larma

    Ah, stringprep only increases IIRC

  17. Link Mauve

    Only decreases right?

  18. Link Mauve

    Like in that ™ → tm example.

  19. larma

    uhm, I was thinking of number of codepoints. Not sure about utf-8 encoding then

  20. Link Mauve

    Oh.

  21. larma

    maybe i'm mixing things up though

  22. larma

    Try with ǰ (\u01F0 = 0xC7B0), I think it should become ǰ (\u006A\u030C = 0x6ACC8C)

  23. Zash

    I observe that it recomposes 0061 0301 → 00E1 (a+´ → á)

  24. larma

    The issue with stringprep is that it does case mapping, and some chars have different codepoint requirements in different cases

  25. larma

    ΐ \u0390 is case mapped into ΐ \u03B9\u0308\u0301

  26. flow

    Link Mauve, would it make sense if it was before normalization? Then a JID would be somtimes valid and somtimes not, a situation that doesn't appear to be much appealing

  27. flow

    fwiw, jxmpp applies enforces the 1023 bytes limit on the resulting normalized string encoded in utf-8

  28. flow

    fwiw, jxmpp enforces the 1023 bytes limit on the resulting normalized string encoded in utf-8

  29. Zash

    but can you pass a >1023 octet string into it?

  30. rom1dep

    hello there, do you happen to know if there's a ranking of clients by popularity? Something that a scraper like muclumbus could aggregate and report about. I'm being told that my perception about a certain client being close to extinct is wrong, and it not supporting recent XEPs is a proof that XMPP is full of obsolete clients

  31. Link Mauve

    rom1dep, hi, it’s for a single server but you can find our stats here: https://stats.jabberfr.org/

  32. Link Mauve

    Check Client identities there.

  33. rom1dep

    Link Mauve: super interesting, thanks!

  34. Link Mauve

    rom1dep, note also that it is a percentage of the total number of current sessions, some people use multiple clients so the total percentage should probably be higher than 100%.

  35. Link Mauve

    But currently it’s only counting sessions.

  36. Zash

    So no clients like Siskin that only stays connected while it's in the foreground...

  37. Link Mauve

    Right.

  38. flow

    Zash, not sure if I understand the question? strings that are longer that 1023 bytes/octets after normaliztion when encoded in UTF-8 will be rejected

  39. Zash

    flow, but before?

  40. flow

    Zash, yes, I can't rule out that the string before is longer than 1023 bytes

  41. flow

    but that is true in general with xmpp string validation: the input string may not be valid