XSF Discussion - 2020-09-20


  1. Ge0rG

    Also something about backward compatibility and changing unicode versions

  2. lovetox

    but is precis not better equiped to deal with changing unicode versions?

  3. lovetox

    i read precis defines what code points are valid, while stringprep defines what code points are illegal

  4. lovetox

    so whats legal changes with each unicode version in stringprep

  5. jonas’

    lovetox, no, stringprep is pinned to unicode 3.2, so that’s no big deal

  6. lovetox

    ah ok

  7. lovetox

    didnt know that

  8. jonas’

    lovetox, PRECIS on the other hand is not pinned to any unicode version, has no logic to deal with differing versions and thus PRECIS on Unicode X.Y may easily produce/validate strings which are not allowed with PRECIS Unicode X'.Y' for X' != X

  9. jonas’

    stringprep being pinned to 3.2 is why we cannot have robotface ;)

  10. lovetox

    sure? when precis defines whats legal instead of whats illegal

  11. lovetox

    it does not matter what unicode version

  12. jonas’

    it defines it in terms of unicode categories

  13. lovetox

    if we assume that new unicode versions only add codepoints

  14. jonas’

    yes

  15. jonas’

    if you run PRECIS on 3.2, it will reject strings PRECIS on 9.0 allows

  16. jonas’

    and updating unicode versions in an app is even harder than updating TLS, I’m afraid

  17. flow

    could be as easy as "apt-get install unicode-data"

  18. lovetox

    as i understand the problem is not some unicode libraray

  19. lovetox

    as i understand the problem is not some unicode library, or a dependency

  20. lovetox

    its simply that in the standard sometimes things change

  21. flow

    the standard changes because with newer unicode versions codepoints that where previously unassigned become assigned

  22. lovetox

    yes, but thats not the only way it changes, and thats not the problem with precis

  23. flow

    I am not aware of other ways

  24. lovetox

    Changes to the properties of Unicode code points can occur as the Unicode Standard is modified from time to time. For example, three code points underwent changes in their GeneralCategory between Unicode 5.2 (current at the time IDNA2008 was originally published) and Unicode 6.0, as described in [RFC6452]

  25. lovetox

    new unicode versions are not just adding stuff on top, sometimes existing stuff changes

  26. lovetox

    or at least there is no guarante that existing stuff does not change by the unicode consortium or whoever decides the stuff

  27. flow

    true, and that could mean that a string that was previously valid as JID part becomes invalid

  28. flow

    I am not sure how frequent that is

  29. flow

    What usually happens is that a string that was previously invalid as JID part becomes valid

  30. lovetox

    yes, also im not sure if this is really something that should hold us back

  31. lovetox

    we are not designing something for eternety here

  32. flow

    I'd expect Unicode tries to prevent re-assigning codepoints whenever possible, for obvious reasons, and only does so if it is decided that the advantages of the re-assignment outweigh the disadvantages

  33. flow

    I'd expect that Unicode tries to prevent re-assigning codepoints whenever possible, for obvious reasons, and only does so if it is decided that the advantages of the re-assignment outweigh the disadvantages

  34. lovetox

    also i think adding new codepoints is not a problem for precis, as its already defined on classes

  35. lovetox

    and there will likely be no new classes

  36. lovetox

    what is a problem are for example mapping rules

  37. mdosch👁🗨

    > Future version of Prosody won't allow 👁🗨 or robot face in nicknames, thus solving that problem. Seems it's not yet in trunk.

  38. lovetox

    which can’t be set in stone without knowing all future codepoints

  39. mdosch

    /can't change his nick back.

  40. mdosch can't change his nick back.

  41. flow

    right, but it means that apps should use the system unicode database, e.g. via pythons unicodedata library that IIRC uses /usr/share/unicode as source

  42. mdosch

    Ah, it worked.

  43. lovetox

    i think its very unlikely with precis that something that was valid becomes invalid

  44. lovetox

    hence i dont see why not use precis

  45. lovetox

    yes servers and appilication need a recent unicode version

  46. flow

    yeah, it also does not concern me much, and there is nothing you can do about it

  47. lovetox

    but they need also X other recent librarys

  48. Zash

    mdosch: It's the MUC that enforces it, and this channel isn't on trunk.

  49. Ge0rG

    The real problem isn't which version to check against but who's responsible for the check at which border. If the check is only performed by the server owning any given JID, no problems will arise.

  50. Ge0rG

    If you enforce precis on another server's JIDs, you'll end up kicking people because of mdosch👁🗨 in your MUC

  51. Zash

    Hence being strict on creation of users, chat rooms, MUC nicknames.

  52. Ge0rG

    We could solve the problem by requiring a baseline set that's forward compatible, like no " in JIDs, and leave everything extended unicode to the authoritative server

  53. Ge0rG

    But I want 🤖

  54. Zash

    AFAIK the problem is mostly about unassigned codepoints, where you don't know if it's valid or forbidden.

  55. Zash

    So someone on Unicode 3.2 doesn't know whether 🤖 is valid or not, so it'll end up allowing it in JIDs received by others, while forbidding local things from using it.

  56. Ge0rG

    Zash: allowed from remote servers, forbidden on yours

  57. Zash

    Exactly

  58. Ge0rG

    And then your admin can install unicode 11

  59. Ge0rG

    And allow fancy new names

  60. Zash

    I wonder if Someone™ should write an Informational XEP on this whole mess.

  61. Ge0rG

    Zash: you should

  62. lovetox

    why do we need to write a XEP

  63. lovetox

    either all should use precis or all should use stringprep

  64. Zash

    That's not what reality looks like

  65. lovetox

    finished, the details that there is one codepoint in a million that from unicode version X to Y changed

  66. lovetox

    really, thats only a problem in some people minds

  67. Zash

    I'm not that worried about Unicode redefining characters between versions.

  68. Zash

    But versions add new characters, which moves code points from Undefined to either Allowed or Forbidden (for each JID part)

  69. lovetox

    Zash i dont think thats how it works

  70. Zash

    Ok.

  71. Zash

    I revert to my earlier statement of not wanting to discuss this.

  72. lovetox

    but even if, whats the problem with that

  73. lovetox

    say an unassigned codepoint is moved to valid

  74. lovetox

    your server simply does not accept it because you are on a older unicode version

  75. lovetox

    there is no problem there

  76. lovetox

    its like jabber.org does not allow connection with another server because it is weird and runs not current software or is misconfigured

  77. lovetox

    the solution is not, to find a standard where this can never ever happen

  78. lovetox

    its server need to upgrade from time to time

  79. lovetox

    and it happens that we do this already

  80. Zash

    Oh look, it's been a year https://mailarchive.ietf.org/arch/msg/xmpp/a-WhzOTyOq168GujQHgzQ1-DURI/

  81. lovetox

    yeah i really dont know what the problem here is

  82. lovetox

    its like, a server comes along that supports only TLS 1.3, but the other server does not yet support TLS 1.3

  83. lovetox

    and the question answer is probably b)

  84. lovetox

    server should validate jids

  85. lovetox

    this means sending errors if the validation fails

  86. Zash

    That's what jabber.org does, which was the problem highlighted earlier.

  87. Zash

    ^ happens

  88. Zash

    For those who don't have joins & parts shown: ---> jabberdotorguser joined the room <--- jabberdotorguser has left the room due to an error (Kicked: jid malformed)

  89. Ge0rG

    lovetox: enforcing validation on entities outside of the user's control is going to cause pain. This is what it's all about

  90. Ge0rG

    Which is why "allowed" and "forbidden" are too few decision choices

  91. lovetox

    about what pain are you talking?

  92. lovetox

    informing the user he cant join this channel because the jid is not valid

  93. lovetox

    is not pain in my book

  94. lovetox

    its a 5 second thing to change the nick to something else

  95. Zash

    They can't change *someone elses nickname*

  96. lovetox

    ?! the user cant change his nickname?!

  97. lovetox

    and yes also the MUC can change his nick, its in 0045

  98. lovetox

    simply remove the offending chars

  99. Zash

    You still misunderstand.

  100. Zash

    This is not about the one that is joining a MUC

  101. Zash

    This is about someone else that is already a participant.

  102. Zash

    When the MUC sends the participant list, their server rejects that stanza and the MUC responds to that error by kicking YOU.

  103. lovetox

    yeah and ? as a client a validate JIDs, and of course simply drop all invalid

  104. lovetox

    you can fill a whole MUC with invalid participants, not a problem in my book

  105. lovetox

    but even that should not happen

  106. Zash

    But it does.

  107. lovetox

    So i cant connect to servers, if mine is outdated and uses old unicode data

  108. Zash

    Get a jabber.org account, join this MUC, get kicked the instant the presence of the participant with "👁🗨" in their nick is sent.

  109. lovetox

    its the same, right now, i cant even connect to must mucs because my cert is expired

  110. Ge0rG

    lovetox: but you can't change a remote server and which level of unicode that accepts.

  111. Ge0rG

    Also the unicode level supported by a server is neither indicated nor negotiated

  112. Ge0rG

    Instead your connection gets terminated later on due to somebody else sending presence

  113. Ge0rG

    And just moving on with the latest and greatest unicode will break your interop

  114. Ge0rG

    In all sorts of non obvious ways

  115. Ge0rG

    it's like showing an annoying popup every time you receive something from an "invalid" JID :D

  116. eevvoor

    > Get a jabber.org account, join this MUC, get kicked the instant the presence of the participant with "👁🗨" in their nick is sent. What a joy.

  117. lovetox

    Ge0rG, i still dont see the "pain", all that stuff is dependent on how often this happens

  118. lovetox

    and i would say it does probably happens as often as you want to use a muc on a server like jabber.org

  119. lovetox

    you try it, ok server doesnt work, is outdated, whatever, then you simply dont use it anymore

  120. Ge0rG

    lovetox: some implementations don't switch from stringprep to precis because of this sort of issues that it would cause.

  121. Ge0rG

    yeah, let's just abandon large parts of our ecosystem

  122. larma notes how this would be solved by not using the resource part of the JID for the nickname and instead use something like 172 and random resource for joining (like Jitsi Meet does)

  123. lovetox

    but stringprep causes the same issues

  124. Ge0rG

    larma: how do you prvent everyone from using the same nickname with 0172?

  125. lovetox

    stringprep is obsoleted, no new client would implement it, there is no note that says: Hold up please implement stringprep

  126. lovetox

    if a client uses precis, and the server validates for stringprep

  127. lovetox

    you have the same issue already, now

  128. larma

    Ge0rG, either not allow it server side (filter stanzas that do try to mimic another user) or use 0421 to spot the different users

  129. Ge0rG

    lovetox: yes, and I bet most clients won't even tell the user what the problem is

  130. eevvoor

    yeay, just like my problem persists with trashserver <-> jabber.fr

  131. lovetox

    so you acknowledge that the problem is already here right now, and *not* changing to precis

  132. lovetox

    does not make anything better

  133. Ge0rG

    lovetox: the problem is there because some implementations changed to precis, yes.

  134. Ge0rG

    lovetox: what you ask for is called a "forklift upgrade" and is not going to work.

  135. lovetox

    it already works

  136. lovetox

    users use precis day in and out

  137. lovetox

    this is a drop in the bucket of s2s problems out there

  138. lovetox

    you make it seem like the whole xmpp ecosystem breaks down, because people cant join mucs anymore

  139. Ge0rG

    it's also about contacts with JIDs according to a different spec

  140. larma

    IMO clients should never try to join using a unicode resource, but servers still need to handle it. Yet every client that allows to do it should be named as the main issue

  141. larma

    (which means about every client nowadays is to blame)

  142. lovetox

    ok larma interesting take, no client should allow a user to use a valid JID as per RFC.

  143. larma

    the weird thing is that resource is meant to be something "technical" yet it's also used as a display name

  144. lovetox

    i guess you wont win that one

  145. Ge0rG

    larma: what about that clients should only warn the user when they try to set a nickname that is outside of the client's supported PRECIS, but the servers have ultimate authority?

  146. Ge0rG

    congratulations for finding out that MUC is a set of dirty hacks.

  147. 🅶🅴0🆁🅶

    hi!

  148. larma

    so let's add some other dirty hack to solve that problem? Clients somehow encode unicode chars using ascii as a resource and add a 0172 nick. Clients that see that the 0172 nick matches the ascii encoding will display and use the 0172 nick instead

  149. larma

    As long as the ascii encoding is somewhat human readable, this would be sufficiently backwards compatible

  150. larma

    then we would need MUCs to not allow joining with non-ascii resources and issue is mostly solved

  151. mdosch

    Forbidding non ASCII is bad for Russians, Arabs, Vietnamese, Thai…

  152. 🅶🅴0🆁🅶

    also for emoji

  153. larma

    mdosch, it's not forbidden, you can still read it in the 0172 nick field

  154. larma

    just like domains don't forbid non ascii, you just need to encode using punycode

  155. 🅶🅴0🆁🅶

    what about encoding punycode nicknames in the resource?

  156. larma

    also fine with me, but I believe there could be better legacy fallbacks than punycode

  157. 🅶🅴0🆁🅶

    like... PRECIS?

  158. Zash

    Went for a walk. TL;DR let's solve the problem of not everyone upgrading at the exact same time with "just upgrade at the same time"? :)

  159. lovetox

    its not a matter of upgrading at a certain point in my opinion

  160. lovetox

    right now prosody does not do jid validation at all or?

  161. lovetox

    that means it already is a upgraded precis like server, it does exactly what Ge0rG fears, it sends precis muc resources to other servers that dont understand it

  162. Zash

    It does, but it allows unassigned characters.

  163. Zash

    Mostly because this is the library default

  164. lovetox

    yeah, so nobody cared, the ecosystem did not break down

  165. Zash

    No PRECIS

  166. 🅶🅴0🆁🅶

    Somebody needs to care about the small things as well

  167. lovetox

    you allowed resources that were not stringprep valid

  168. lovetox

    and send them to other servers and clients

  169. Zash

    This goes under "historical reasons" now

  170. Zash

    The plan is to change it to be strict about things created locally.

  171. lovetox

    and in all the years i never saw one issue, of clients or server operators

  172. 🅶🅴0🆁🅶

    yeah, let's just pin XMPP to stringprep and carve that in stone

  173. lovetox

    that complained that users cant join your MUCs

  174. 🅶🅴0🆁🅶

    Zash: strict according to what spec?

  175. Zash

    This /did/ happen, for years, if you compiled your Prosody differently

  176. Zash

    🅶🅴0🆁🅶: Ancient STRINGPREP

  177. 🅶🅴0🆁🅶

    Zash: did I get this right, you want to make prosody strict according to stringprep?

  178. Zash

    Yes

  179. 🅶🅴0🆁🅶

    Insanity!

  180. Zash

    No

  181. 🅶🅴0🆁🅶

    But why?

  182. Zash

    I'm running a version with this enabled and I'm having no problems.

  183. Zash

    You just can't use 🤖 as nickname on my local MUC instance

  184. 🅶🅴0🆁🅶

    do you even have a MUC domain?

  185. Zash

    It's strict about *local entities*

  186. 🅶🅴0🆁🅶

    I want my robot face back!

  187. Zash

    Local users, local MUC JIDs, local MUC participant nicknames.

  188. Zash

    Anything coming from a remote server and isn't known to be invalid is accepted.

  189. 🗨

    Zash: yes, but then you release it and everybody goes back into 2002

  190. 🗨

    I know there are people who wish for xmpp to be like it was in 2002.

  191. Zash

    ITYM 2006*

  192. lovetox

    seems ejabberd does validate strictly for stringprep

  193. lovetox

    cant join any muc with robotface

  194. 🗨

    Zash: https://tools.ietf.org/html/rfc3454 "December 2002"

  195. lovetox

    although it does with a weird error

  196. Zash

    🤷

  197. lovetox

    bad-request

  198. lovetox

    instead of jid-malformed

  199. lovetox

    maybe the XMPP needs its own Precis profile

  200. lovetox

    that is in some way a better upgrade path

  201. lovetox

    for mucs the discovery problem is easily solved

  202. lovetox

    just add a feature into disco info

  203. Zash

    This topic causes me endless pain, I'll be under my desk, crying, for the rest of the weekend.

  204. lovetox

    only in our heads, i dont think there are actual users having problems with that

  205. mdosch is watching 🏒 rolls some beers under Zаshs desk.

  206. lovetox

    i guess there is not even someone out there that gets the idea that weird emojis are allowed in jids

  207. Ge0rG

    I do

  208. Ge0rG

    and other people as well

  209. mdosch is watching 🏒

    Ask Rixon 👁🗨:

  210. Ge0rG

    as you can see on the occupant list. Unless your client filters out "invalid" JIDs from MUCs

  211. mdosch is watching 🏒

    I wonder why Rixon 👁🗨 never participated in this discussion although being highlighted frequently in the last days.

  212. Ge0rG

    mdosch: maybe their client fails to highlight on complex unicode? 😁

  213. mdosch is watching 🏒

    😂

  214. Neustradamus

    I will do a little test for MattJ

  215. Neustradamus

    ^ connection and disconnection in less 1s

  216. sss

    Hey anyone here

  217. sss

    heyy

  218. sss

    hey sony

  219. sss

    andrey.g andrey.g \