XSF Discussion - 2017-09-25

  1. Ge0rG

    Is it possible to test XEP-0368 Direct TLS without actually creating the DNS records? You can't put SRV into hosts files :(

  2. jonasw

    Ge0rG, test the implementation at the client or at the server?

  3. Ge0rG

    jonasw: test a server deployment

  4. Kev

    Ge0rG: dnsmasq running locally.

  5. jonasw

    most clients allow to specify a port

  6. jonasw

    and a host to connect to explicitly

  7. jonasw

    so you’d use that

  8. Ge0rG

    Yeah, I once enabled it in Gajim with a dozen clicks or so.

  9. zinid

    guys, what string should be put in SNI in the case of IDN domain? original or pynnycoded?

  10. jonasw

    zinid, interesting question, that I’d like to know too :-)

  11. zinid

    ah, found in RFC6066

  12. zinid

    "HostName" contains the fully qualified DNS hostname of the server, as understood by the client. The hostname is represented as a byte string using ASCII encoding without a trailing dot. This allows the support of internationalized domain names through the use of A-labels defined in [RFC5890].

  13. zinid

    so should be punnycoded

  14. jonasw


  15. Ge0rG

    I wonder if I should move the yax.im A records to point to the XMPP server instead of the web server, so that clients that fail SRV will still properly connect.

  16. Ge0rG

    It looks like 15% of client connections ignore SRV for yax.im

  17. edhelas

    Hi, I'd like to know the state of this PR ? https://github.com/xsf/xeps/pull/500

  18. jonasw

    edhelas, I think some council votes are pending, I haven’t processed last council meetings minutes yet, sorry

  19. jonasw

    edhelas, most efficient will be if you ping me on the PR, I’ll take care of it when I get home

  20. edhelas


  21. edhelas

    I'm also planning to do other PR on 0060, maybe today

  22. edhelas

    jonasw what is your github account nickname ?

  23. jonasw

    edhelas, @horazont

  24. edhelas


  25. jonasw

    de rien

  26. edhelas

    I'm also planning to do another PR on 0060, but I'd like to get some feedbacks here before

  27. edhelas

    I'd like to expose the access_model of the nodes in their metadata

  28. edhelas

    I'm wondering if this could brings issues

  29. edhelas

    basically adding pubsub#access_model there https://xmpp.org/extensions/xep-0060.html#entity-metadata

  30. daniel

    > It looks like 15% of client connections ignore SRV for yax.im Would be interesting to know what clients those are and whether or not they are using Tor

  31. daniel

    conversations.ims numbers are equally high. Maybe even closer to 20%

  32. Ge0rG

    I'll do some version logging for the next days.

  33. jonasw

    Ge0rG, how are you going to track that?

  34. jonasw

    also, I’ve seen clients fall back to A/AAAA if they try to connect before DNS is up

  35. Ge0rG

    jonasw: modified mod_query_client_ver in prosody. Non-SRV connections to yax.im all come through a NAT on the web server

  36. jonasw

    the SRV lookup fails (and they cannot necessarily distinguish the reason) and go on with A/AAAA, which may then pass :/

  37. Ge0rG

    I've had very often "Connection refused" errors from my own yaxim instance for a week or so, and then I realized the NAT rule got reset.

  38. jonasw

    I think that the A/AAAA fallback may be doing more harm than good

  39. jonasw

    I’ve had very confusing certificate errors for weeks until I realized that A/AAAA pointed to a test instacne which wasn’t supposed to be live where the certificates had expired. I don’t even want to know what happened *before* the certificates expired ...

  40. Ge0rG

    jonasw: it's clearly a bug, the question is just _where_.

  41. jonasw

    the root cause is probably that applications cannot (do not?) distinguish between "network errors" and "records don’t exist"

  42. Ge0rG


  43. jonasw

    with validating resolvers, you’ll also always rather see a generic validation error in favour of a NXDOMAIN if the backedn experienced network errors

  44. jonasw

    so that isn’t going to go away

  45. jonasw

    well, okay, that actually improves things.

  46. jonasw

    if the API exposes the difference

  47. Ge0rG

    It looks like most Non-SRV connections are from yaxim, followed by Conversations. And then some Pidgin and Cackle.

  48. Ge0rG

    However, the stats are skewed because I query on new connections, and those happen far more often on mobile

  49. jonasw

    and your userbase is probably also skewed

  50. jonasw

    towards yaxim

  51. Ge0rG

    No way! I'm a neutral server operator!

  52. jonasw

    that may be, but are you also a neutral app developer? ;-)

  53. daniel

    cackle is just a Conversations fork though

  54. daniel

    or theme

  55. Ge0rG

    There also was one MAXS.

  56. jonasw

    MAXS <3

  57. MattJ

    MAXS <3

  58. Flow

  59. Ge0rG

    daniel: did you change the DNS records for conference.siacs.eu around noon on Saturday? My prosody wasn't able to resolve the server in the morning, then came up with an old(?) IP aroung 11:15, and then failed to resolve again.

  60. daniel

    Ge0rG, we switched over on friday at ~23:45

  61. daniel

    i don't think i've touched the records since

  62. Ge0rG

    daniel: I'm sure this was a weirdness in prosody's DNS code, but I wanted to be 100% sure with that.

  63. Ge0rG

    daniel: and was the old IP?

  64. daniel

    sounds about right

  65. Ge0rG

    daniel: I'll quote you on https://prosody.im/issues/issue/1001 if that's ok.

  66. daniel

    i just created srv records. but that doesn't seem to help

  67. daniel

    or maybe it did and just takes some time to propagate https://status.conversations.im/reverse/conference.siacs.eu/

  68. daniel

    let's wait and see what happens

  69. Ge0rG

    prosody has some strange bugs in handling CNAMEs.

  70. daniel

    creating the srv record did in fact fix it

  71. Ge0rG

    daniel: ...worked around ;)

  72. daniel


  73. Ge0rG

    The XSF is 90% about semantics.

  74. dwd

    CNAMEs are really odd. They shouldn't work (but might) in combination with SRV records, for a start.

  75. Ge0rG

    Yeah, but they don't even work without SRV.

  76. dwd

    Ge0rG, Arguably they shouldn't - RFC 6120 § 3.2.2 only says A or AAAA. That probably implies CNAME (and DNAME), though.

  77. jonasw

    DNAME is entirely DNS-server-side anyways, isn’t it?

  78. dwd

    Ge0rG, You *can* - in principle - use CNAMEs for, say, _xmpp-server._tcp.example.org. Just not for whatever the hostname it looks up to is.

  79. dwd

    jonasw, There's a fallback to do that, but I think there's an EDNS0 flag for handling them client-side.

  80. Ge0rG

    dwd: but the service name is a CNAME, and it doesn't resolve

  81. dwd

    Ge0rG, What do you mean by the service name?

  82. Ge0rG

    dwd: conference.siacs.eu. 300 IN CNAME xmpp-hosting.conversations.im. xmpp-hosting.conversations.im. 300 IN A

  83. dwd

    So that only works is the process looking up decides it'll use gethostbyname/getaddrinfo, or else do DNS directly but follow CNAMEs.

  84. dwd

    Neither is spelt out in RFC 6120 § 3.2.2.

  85. Ge0rG

    I'm not sure RFC6120 is the right place to define how DNS should work.

  86. Ge0rG

    However, with the wording you referenced, I could blame daniel for not following the RFC, instead of blaming prosody for having a broken CNAME lookup mechanism.

  87. jonasw

    given that we have SRV, I don’t see the reason for CNAMEs in any case.

  88. jonasw

    (as mentioned earlier, I think the A/AAAA fallback does more harm than good)

  89. Ge0rG

    jonasw: SRVs happen to be black magic from the future for many DNS providers.

  90. jonasw


  91. Holger

    So the Prosody people broke their CNAME caching in order to strictly follow RFC 6120?

  92. zinid


  93. Ge0rG

    jonasw: with some DNS operators, you can't add SRV entries.

  94. MattJ

    Holger, just for the record... no :)

  95. jonasw

    Ge0rG, I understood, but I am horrified

  96. zinid

    so these providers don't follow RFC6120?

  97. zinid

    we need to notify them

  98. Ge0rG

    zinid: oh, they do.

  99. Ge0rG

    it's the others that don't.

  100. Holger

    jonasw: You might have the CNAME record for other services anyway. Apart from that you might want to maintain the SRV targets in a single record and have multiple CNAMEs pointing to that.

  101. Holger

    dwd: I agree with Ge0rG that 6120 sounds like the wrong place to specify such things. But if it's the right place, then missing CNAME support sounds like an obvious 6120 bug to me.

  102. dwd

    Holger, I don't think it is specifying how DNS works. I do think it ought not to be quite so precise in the lookups involved.

  103. Holger

    Just sounds wrong to me that each and every protocol that uses DNS names should specify "yes we also resolve CNAMEs like everyone else".

  104. Holger

    As opposed to just specifying the parts that are *specific* to this protocol.

  105. jonasw

    I bet there’s some wording in the document defining CNAME that resolvers (including stub resolvers) MUST follow CNAMEs transparently or so

  106. Flow

    > ‎[14:07:11] ‎jonasw‎: (as mentioned earlier, I think the A/AAAA fallback does more harm than good) Given the amount of DNS implementations not supporting SRV RRs, I doubt that this is true

  107. Flow

    what Holger said

  108. zinid

    just let's use NAPTR to break things completely :)

  109. jonasw

    Flow, is there a list of such popular services?

  110. jonasw

    and IM services hosted there? they should apply some pressure.

  111. Flow

    jonasw: I'm not only talking about services, more about all things DNS

  112. jonasw

    the issue with the fallback is that it forces services using SRV records to also have valid A/AAAA records or at least it constraints what you can do with the A/AAAA of the domain.

  113. Flow

    jonasw: It doesn't force them

  114. Flow

    but yes, for maximum connectivity you want to have your XMPP domain also resolve A/AAAA

  115. jonasw

    Flow, no, if there are intermittent issues which makes the client believe that the SRV records don’t exist, they fall back to A/AAAA

  116. jonasw

    and that’s an issue

  117. Flow

    it's an issue if there are no A/AAAA records

  118. jonasw

    or if the records point to something which isn’t the XMPP service you wanted to connect to

  119. Flow

    but how would not having the A/AAAA fallback improve the situation

  120. jonasw

    if there are no A/AAAA records, it is more or less obvious to clients that they should re-try later because it’s most likely network

  121. jonasw

    (or a configuration error)

  122. jonasw

    but if end up in the fallback (e.g. on a transparent stream-managmeent reconnect) and the fallback is not the XMPP service you’re looking for, a lot of funny stuff can happen, from certificate errors, over stream errors to authentication failed

  123. jonasw

    all of which will probably nuke the clients state

  124. jonasw

    that’s what I mean by "harm"

  125. jonasw

    (I had that once with an unfortunately configured A/AAAA record which pointed to another XMPP service)

  126. jonasw

    (took me weeks to figure out what the reason for those errors were)

  127. Flow

    jonasw: I see, but without the fallback you wouldn't even be able to connect as soon as SRV breaks for some reason

  128. jonasw

    Flow: yes, and treating it as a network error would do the right thing (retry soon)

  129. Flow

    jonasw: Not if it's your resolver lib not being able to perform SRV lookups

  130. Flow

    or you home router resolver

  131. jonasw

    but you can't distinguish a wrong A/AAAA you should never have seen from incorrect credentials or something

  132. Flow

    incorrect credentials should return a well defined error, no?

  133. Flow

    but, yes, the situation is not ideal

  134. jonasw

    Flow: sure it does, but you can get such an error when connecting to the wrong xmpp service due to A/AAAA lookup

  135. Flow

    i see

  136. Ge0rG

    When I send a MUC join and lose my connection, so that it will be closed by a 0198 timeout, prosody will send error responses to all queued stanzas, including individual MUC participants. Is that good / bad / ugly / all of the above?

  137. jonasw

    Ge0rG, I think MUCs won’t route error messages back. sending back error presences is the right thing.

  138. Ge0rG

    Except that some funny MUC implementation will also kick all my MSNs

  139. Ge0rG

    or is that NMSs?

  140. jonasw

    sure, but that are broken MUC implementations then

  141. jonasw

    not sending unavailable presence would be desastrous

  142. Ge0rG

    jonasw: it's okay to send presence-unavailable to my own nickname, but to all the participants?

  143. jonasw


  144. Ge0rG

    or rather, presence-error.

  145. jonasw

    to the participants doesn’t seem right to me

  146. dwd

    Ge0rG, I'm not sure I understand what the problem you're describing is.

  147. Ge0rG

    it's right from the 0198 session destruction context, though.

  148. Holger

    What's the downside with just dropping all presence stanzas on 0198 timeout?

  149. Holger

    How does the error stanza help anyone?

  150. MattJ

    If you send presence to someone, do you expect an error if they don't ever receive it?

  151. dwd

    Ge0rG, So you have an existing local session connected to a local MUC, in a 198-detached state, and then this times out?

  152. Holger

    MattJ: I don't.

  153. Holger

    MattJ: Because how would I handle that error?

  154. dwd

    Holger, Giant modal dialog box of course.

  155. Holger


  156. dwd

    Holger, I'm surprised you had to ask.

  157. Ge0rG

    dwd: I'll try again: 1. I send a join presence to a MUC 2. I disappear into the void 3. The MUC sends everything that's sent on join to my 0198 cache 4. my 0198 session gets destroyed, so my server sends an error response for each individual stanza in the cache, including all the participant presences.

  158. dwd

    Ge0rG, Ah, OK. And what's wrong with that?

  159. Ge0rG

    dwd: the flood of presence errors to MUC participants.

  160. dwd

    Ge0rG, Ah, OK. And what's wrong with *that*?

  161. Ge0rG

    dwd: that was the point of my question. Is it wrong or just ugly.

  162. Holger

    Being useless?

  163. dwd

    Holger, Useless is OK, or at least it's nothing bad, surely?

  164. Holger

    It's nothing bad.

  165. jonasw

    I’m not sure

  166. jonasw

    sending presences to other MUC participants is at least weird

  167. dwd

    Ge0rG, I think it's right. Although I don't think the MUC should be broadcasting presence errors - it should juts error you out fo the MUC and broadcast that.

  168. jonasw

    because that’s normally how you join/change nicknames

  169. Ge0rG

    dwd: I don't know what the MUC does with the flood, to be honest

  170. dwd

    Ge0rG, If it just absorbs it, that's fine. I think.

  171. Ge0rG

    dwd: sounds reasonable to me.

  172. dwd

    Ge0rG, The problem is that to stop it, we'd need to track not just the stanzas, but the semantics of those stanzas.

  173. dwd

    And that's really the MUC entity's job, I think.

  174. jonasw

    shouldn’t all MUC presences have an <x/> in them which makes it easy to find?

  175. dwd

    (At least, wherever possible)

  176. Holger


  177. Ge0rG

    jonasw: and <x/> specific code in 0198 as well, now?

  178. jonasw

    Ge0rG, *shrug*

  179. Holger

    My question was: My not just silently drop *all* presence stanzas on 0198 timeout?

  180. Ge0rG

    Let's fix 0045 first.

  181. Holger

    No matter whether MUC-related or not?

  182. Holger

    Is there a single use case where the originator of the presence would handle the error message in any other way than ignoring it?

  183. dwd

    Holger, I don't think that's needed, or desirable. If an error would be generated immediately on session close, then it should be generated on 198-closure.

  184. Holger

    It would not be generated without 0198, no?

  185. jonasw

    does one get presence-errors when sending a presence to an unavailable entity?

  186. Holger

    This is a 0198 (mis)feature.

  187. Holger


  188. dwd

    jonasw, Ah, that's a "sort of". You get a presence error if your sending the presence causes an error to be detected.

  189. dwd

    Speaking of 198, what are people setting the timeout to these days?

  190. jonasw

    I have it set at 5 or 10 minutes I think

  191. jonasw

    I have it set at 10 minutes.

  192. dwd

    jonasw, Any statistics on hit/miss of resume attempts?

  193. Ge0rG

    On my personal server I set it to 2h, because when on bad mobile my data connection might get interrupted for so long due to a phone call

  194. jonasw

    dwd, I don’t think I have logs with enough detail, also my userbase is approximately 10.

  195. SamWhited

    Ge0rG: Is that a CDMA thing that your data gets cut off when you're on a call?

  196. Ge0rG

    SamWhited: no, it's a 2G/LTE thing.

  197. Holger

    When I looked some years ago, my impression was that most resumptions happen within 5 minutes. Which seems to be a common default.

  198. dwd thinks hit/miss statis would be amazingly interesting.

  199. Ge0rG

    SamWhited: 3G can route voice and data at the same time, the others can't

  200. Holger

    I.e. increasing the timeout significantly won't increase the resumption rate significantly.

  201. Ge0rG

    dwd: there might be false negatives due to client restarts (e.g. OOM conditions)

  202. SamWhited

    Ge0rG: I'm reasonably sure LTE can, no? Maybe my phone is using both or something to get around that restriction. I should look into this.

  203. dwd

    Ge0rG, That wouldn't give a resume attempt, no?

  204. Ge0rG

    SamWhited: only if you have VoLTE

  205. Ge0rG

    SamWhited: otherwise, your phone will fall back to 3G or 2G, whatever's there.

  206. SamWhited

    Oh! Right, forgot that was a thing.

  207. dwd

    Ge0rG, I mean, it would give a resource conflict and killing the original detached session.

  208. Ge0rG

    dwd: right

  209. jonasw

    (for now).

  210. Ge0rG

    until people start using random resource IDs.

  211. dwd has a 198 resumption patch for Openfire, but it's not timing out yet - like, at all.

  212. SamWhited

    Or rather, I forgot CSFB was a thing. Ge0rG: You're in the U.S. no? Do some providers not support SVLTE or VoLTE?

  213. Ge0rG

    SamWhited: I'm in Germany. VoLTE support is rather spotty here, and you need a manually selected "compatible" phone.

  214. Holger

    dwd: That's how Cisco Jabber did it initially.

  215. SamWhited

    Ge0rG: Good to know; thanks. I wrote about this stuff a bit in the mobile considerations XEP, but obviously don't actually know what I'm talking about

  216. Ge0rG

    unbounded 0198 sessions guarantee awesome UX

  217. dwd

    Ge0rG, But quite high memory usage, I suspect.

  218. Holger

    (And when the client didn't resume for some reason and tried to open a new session with the same resource, the new session was rejected.)

  219. Ge0rG

    dwd: memory consumption is something usually not seen by your users. An "online" buddy that doesn't react for days, and where all the messages vanish, does.

  220. Ge0rG

    Holger: yeah, that's awesome!

  221. SamWhited

    Depends who your users are.

  222. jonasw

    I wonder whether unbounded sessions are indeed possible with some tricks

  223. SamWhited

    If you make an appliance that someone else runs, your users notice high memory usage.

  224. Ge0rG

    jonasw: possible - maybe. practical - nope.

  225. jonasw

    like: instead of storing messages in some memroy buffer, refer to MAM. apply CSI rules to drop messages.

  226. jonasw

    presence is trickier, IQs too

  227. dwd

    jonasw, The problem is what other users see.

  228. jonasw

    dwd, is it? is presence even a relevant thing anyomre?

  229. dwd

    jonasw, Although we could do some magic there, even, by triggering unavailable presence but leaving the session open. MUC dies, of course, but MIX would stay live.

  230. dwd

    jonasw, Yeah, it's relevant. Conversations notwithstanding, there's lots of IM applications where presence is as vital as it always used to be.

  231. jonasw

    dwd, while I have you here: a friendly reminder that there are still missing votes from you on the last council meeting :)

  232. dwd

    jonasw, Oh, yeah. Weird bug hit me, so I was in the room but not seeing anything. I need to track that one down.

  233. Ge0rG

    dwd: maybe you weren't in the room at all then?

  234. Ge0rG

    0045 has a nice set of desync issues.

  235. dwd

    Ge0rG, Oh, I was. Got the presence, too. Just not the messages. I've half a feeling I've cocked something up somewhere. I've literally no idea what build I've been running.

  236. SamWhited

    Brand new web client, first field I tried was an XSS and naturally I can't find a security contact.

  237. jonasw

    SamWhited, excellent!

  238. SamWhited

    I give up. I should just go blackhat, it would be way easier.

  239. waqas

    SamWhited: But it's shiny!

  240. waqas

    Honestly, I've given up reviewing JS/HTML XMPP clients, and will fail to trust any unless I write one myself

  241. waqas

    I suppose that's not limited to XMPP clients...

  242. SamWhited

    For the sake of my sanity I should do the same.

  243. waqas

    And the shinier and fancier they are, typically the worse the lack of even slight thought put into security hardening

  244. jonasw

    waqas, that’s my impression, too

  245. jonasw

    and I haven’t even tried to pentest anything :)

  246. SamWhited

    On the plus side, 3 seconds (if that) from login to XSS might be a new record. I am not happy about this, record, but I guess it's nice to have a new personal record?

  247. waqas

    SamWhited: I admit, I haven't broken one in 3 seconds yet :)

  248. jonasw

    SamWhited, is it free or open source software? post to oss-security ;-)

  249. jonasw

    SamWhited, congrats, too

  250. SamWhited

    waqas: I literally logged in, pasted a stupid simple XSS into the first field I saw, and sure enough it worked.

  251. jonasw

    how can you even have such things if you do XML

  252. jonasw

    that sounds as if you could also paste raw XML into the XML stream

  253. waqas

    jonasw: Interpret it as HTML, obviously

  254. SamWhited

    jonasw: Probably. In this case that's not what was happening (it was a roster group name being decoded and inserted into the DOM as HTML)

  255. waqas

    SamWhited: I'd bet it's string concat. blah.innerHTML += "<div>" + text + "<div>";

  256. jonasw

    SamWhited, sure, but ... but ... I can’t even. so they used innerHTML?!

  257. jonasw

    the world is bad

  258. Ge0rG

    SamWhited: but roster groups are only visible to yourself!1!

  259. SamWhited

    Ge0rG: Yah, that particular one might not be the worst attack vector since they'd have to have access to your client anyways I guess. Either way, it probably means there are others.

  260. Ge0rG

    Yeah. That's probably true. Sad, but true.

  261. jonasw

    isn’t there a way to share roster items? ;-)

  262. Zash

    jonasw: roster item exchange?