XSF Discussion - 2019-09-13


  1. jonas’

    flow, I don’t see how your solution suggested on xmpp@ietf is practical. 1. "let’s change all the existing unicode interface implementations!" does not seem practical to begin with 2. it still leaves the issue of software not being on the same version all the time across the internet, and updating the unicode database is a *breaking* update [1], so it won’t happen e.g. on debian [1]: https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html

  2. flow

    jonas’, I did not suggest to change all existing unicode interface implementation, nor to perform it system wide. The point is, I could take CSH's precis library, add a feature which allows to (dynamically) load a new Unicode Character Database (UCD) and be done. Yes, probably not everyone will always be on the same Unicode Version, but there is not much you can do about it, besides reducing the amount of services which are not on the same unicode version. And increasing agility regarding the supported unicode standard helps here

  3. Daniel

    yes to my understanding this is the benefit of precis. you can just ship a new unicode database with a minor update of the app. you don’t have to write new code

  4. Daniel

    that should be as easy as increasing a build dependency somewhere (or something)

  5. jonas’

    let’s load the unicode database from the xmpp server

  6. Daniel

    so one could hope that proliferation of new databases goes rather fix

  7. jonas’

    well, at least the only implementation of Precis I know for python it just uses the db shipped with python

  8. Daniel

    well if we identify that early as a pontential source of problems we could hand out implementation notes that say please make it pluggable

  9. jonas’

    true

  10. flow

    jonas’, i actually pondered about putting the unicode database into dns as PoC. It may be not suited for clients, due privacy reasons, but maybe for servers. And it's fun ;)

  11. dwd

    flow, I agree that your proposal would be an improvement, but like others, I see it as a mitigation given the circumstances rather than an outright solution.

  12. flow

    dwd, point of view i guess, I don't think there is a solution which does not involve agility of the supported unicode standard, so I see agility as a solution

  13. flow

    but happy to be proven wrong

  14. dwd

    flow, I think we have to be aiming for what you're suggesting. I do not think it'll solve things, just lessen their occurence.

  15. jonas’

    https://github.com/byllyfish/precis_i18n/issues/8

  16. jonas’

    let’s see how they react :)

  17. dwd

    jonas’, I suspect it's a matter of having unicodedata able to dynamically load.

  18. jonas’

    dwd, that’s what I’m asking for, ain’t I?

  19. dwd

    jonas’, I mean, in the standard Python library, not so much in precis_i18n.

  20. jonas’

    dwd, python uses it for parsing, so I doubt that’s happening

  21. jonas’

    (I think)

  22. jonas’

    having unicodedata load dynamically in a separate object (like you can instantiate separate `random` objects) would be a measure of course

  23. jonas’

    Daniel, you have a working prototype for MIX? Is it using :core:0 or :core:1 for joining? MIX-PAM uses :core:0, but MIX-CORE uses :core:1 :(

  24. Daniel

    haven’t notice that. i seem to be using core:0 and pam:0

  25. jonas’

    nice, it’s pam:1 in the spec though

  26. Daniel

    and it was working with the one and only server implementation

  27. Daniel

    it has been 3/4 year though since i tested things

  28. ralphm

    dwd: I'm curious if we could something with service discovery and/or stream features.

  29. dwd

    ralphm, Possibly. But - taking "schloß@example.org" for a moment - if we assume that the caonical form is decided by "example.org", then a local user might type "schloss@example.org", and at some point we ask example.org what the correct version is. At that point, do we need heavyweight canonicalisation on other servers? (We might do - I'm just exploring).

  30. Zash

    Dear lazyxmpp, I wish for a PRECIS implementation suitable for use in C

  31. ralphm

    Did the normalization of ß change between Unicode versions?

  32. U-061C

    that response on the ietf mailing list, i believe, misses the point. this is a security issue of denial-of-service, not just compatibility

  33. ralphm

    Indeed Nameless RTL Person.

  34. Zash

    Should we adopt U+061C instead of the speech bubbles? 🙂

  35. Zash

    Or maybe U+00DF

  36. Ge0rG

    Zash: U+FFFD was also suggested.

  37. U+061C

    (btw, U+061C was the first character that I could find to have an empty nickname. some other ones were already blocked when i tried)

  38. U+061C

    so it's not that bad ^^

  39. Zash

    Space characters that are in Unicode 3,2 (or whatever stringprep uses) would be normalized to U+20 and then the nickname is forbidden if that's all there is

  40. Zash

    Space-ish codepoints outside of that tho

  41. ralphm

    Well, sure. That's why PRECIS came about

  42. U+061C

    well, a nickname that looks empty

  43. Ge0rG

    At least it doesn't crash iPhones

  44. moparisthebest

    Nice one jonas’ : > it is required that users of PRECIS have precise control over the unicode database version used.

  45. Seve

    precisely

  46. flow

    what a precious comment

  47. jonas’

    oh my god, I didn’t even notice

  48. Seve

    suuure :D

  49. ralphm

    Nice one