jdev - 2022-03-12


  1. lovetox

    im interested in the size of the tcp read buffer people use

  2. lovetox

    currently is use 8192 bytes

  3. lovetox

    i wonder if this is too high, seems like a lot of stanzas can fit into this

  4. lovetox

    which means on every read i would have to process potentially a lot of UI updates

  5. Ge0rG

    that sounds like the opposite of a problem

  6. lovetox

    why?

  7. Ge0rG

    lovetox: you'll probably only hit this limit if you get flooded by some entity, like right after authenticating to the server or when joining a huge MUC

  8. lovetox

    i always get flooded from servers

  9. Ge0rG

    aggregting actions is much better than doing one thing at a time in the event loop

  10. lovetox

    thats the default if you join mucs

  11. Ge0rG

    if you have trouble coping with that in the UI, optimize the UI code ;)

  12. Ge0rG

    maybe you have some O(n²) algorithm hidden in there?

  13. jonas’

    what has the tcp read buffer to do with anything?

  14. jonas’

    isn't that more of a choice whether you process the actions caused by stanzas synchronously or asynchronously?

  15. lovetox

    thats not much of a choice

  16. jonas’

    if you say so

  17. lovetox

    the buffer defines the max amout you can receive with a read()

  18. lovetox

    hence limits the UI operations done afterwards

  19. lovetox

    the situation is, you can pull more faster data from the network than you have capacity to process in the UI

  20. lovetox

    the situation is, you can pull faster data from the network than you have capacity to process in the UI

  21. Ge0rG

    lovetox: have you thought about not updating the ui in small increments, and instead perform all changes, then doing a single redraw() when the read buffers are empty?

  22. lovetox

    sounds insanly complicated

  23. lovetox

    like i receive a presence, and then wait an amount X maybe i receive an updated presence, and then draw only the last?

  24. Ge0rG

    depends on your UI framework and on how you interact with it, I suppose

  25. Ge0rG

    lovetox: well, some programs will just delay the redraw by .2s, others will only issue the redraw after the read buffer is empty

  26. lovetox

    im not sure what you mean by empty, i call read(8192)

  27. Ge0rG

    is it a blocking or a non-blocking call?

  28. lovetox

    non blocking

  29. Ge0rG

    so it will return 0 if you are done, and you can issue a redraw then

  30. lovetox

    but it will not return 0 for like a minute

  31. lovetox

    if you join100 mucs on connect

  32. lovetox

    thats what i mean this is essentially a ddos scenario

  33. Ge0rG

    well, you could have two threads, one doing the stanza processing, the other doing UI redraws as fast as it can

  34. lovetox

    the server sends me a potential endless stream of data

  35. lovetox

    Ge0rG, so you build up a unbounded queue of UI draw events, because the one thread pulls data in faster then the other thread can process

  36. lovetox

    thats what i say, async does not help here

  37. lovetox

    you need to limit the input

  38. lovetox

    to a level what your computer can process

  39. lovetox

    not stuff it into a queue which you process sometime later, maybe

  40. lovetox

    i had 2 ideas, one i limit the read size, meaning i need more time to pull in the data, meaning the ui has more time to update

  41. lovetox

    the second was to just limit the priority of when i call read(), meaning if there any UI events outstanding, i simply not call read() again

  42. Ge0rG

    lovetox: if you don't do redraws every 8KB, you won't do redraws every 4KB or 2KB

  43. lovetox

    pulling 4x 2 kb from the network takes more time, then pulling 8kb once

  44. Ge0rG

    some microseconds, yes

  45. Ge0rG

    it's already there in your network stack, waiting for you to fetch it

  46. lovetox

    yes nevertheless, its an asnyc call, meaning if i call read(), i will not receive anything (even if its already there) in the mainloop iteration where i issued the call

  47. lovetox

    meaning thats a full iteration where other code can do other things

  48. lovetox

    but actually i wanted to know what other people use

  49. lovetox

    so Ge0rG what do you use?

  50. Ge0rG

    lovetox: I have a dedicated thread doing data processing

  51. Ge0rG

    and I send events to the UI to redraw.

  52. lovetox

    so when your UI freezes because it needs to process a lot of events, you pull in just more and more data in the background

  53. Ge0rG

    no, I always process data on a different thread, not blocking the UI

  54. lovetox

    i really having trouble understanding how that technically works, so you receive 200 stanzas, and then you issue 200 events to the ui thread, but it can only process say 10 events / second, so where do the other 190 events go? into some kind of UI event queue? and while that queue builds up you just pull more data in the other thread?

  55. Ge0rG

    I think I'm pooling things like all presence from a room

  56. lovetox

    presence was an example, great if you are pooling them, but my question was about the how that would work if you getting spammed by a server

  57. Ge0rG

    lovetox: it's complicated ;) my backend service is storing data in an SQLite DB, and the content provider notifies the UI then

  58. lovetox

    one problem i often come across is, when i need to export data from a jid to hard disk, and i need to name the file like the jid

  59. lovetox

    is there some save jid to allowed chars for harddrive convert thingy

  60. Ge0rG

    on linux, you can have anything but / and NUL, so bare JIDs go.

  61. Ge0rG

    on windows, you are in hell

  62. lovetox

    really anything? because full jids also allow stuff like emoji codepoints

  63. Ge0rG

    lovetox: depends on the fs of course, but on ext4 everything goes.

  64. Ge0rG

    you don't need to have valid utf8 or somesuch

  65. lovetox

    hm so i replace / with "-" or something, and look when that breaks

  66. qy

    > lovetox wrote: > presence was an example, great if you are pooling them, but my question was about the how that would work if you getting spammed by a server How frequent or realistic a scenario is it that you'd be spammed by a server so fast that the UI could not keep up? (admittedly in weechat I use the same model, but only begrudgingly)

  67. lovetox

    pretty high, because in xmpp you can pretty easily amplify the amount of data you get, i can send 100 join presences, resulting in 100.000 presence stanzas

  68. lovetox

    this all depends of course on how fast your server is, how fast your connection is, how fast your computer is

  69. qy

    But surely by that logic, when i applied MR677, my gajim should have become unusable and constantly be playing catchup?

  70. lovetox

    Not sure about what you are talking right now, but that is not the Gajim chat and i was not talking about a issue you had

  71. qy

    Huh

  72. qy

    Still relevant to me, cause i am technically halfway through rewriting my client to use an unbounded queue instead of direct processing, and best i can tell, even with all my mucs open, there's no real risk of not being able to keep up except at initial connect, so unless the UI redraw takes a very long time, i don't quite follow where the risk is, since the alternative is just freezing anyway

  73. lovetox

    that just does not sound like a sound architecture

  74. lovetox

    introducing an unbounded queue

  75. lovetox

    like instead of ui freezing (one problem) you probably have now much more propblems

  76. lovetox

    like instead of ui freezing (one problem) you probably have now much more problems

  77. lovetox

    data loss, memory management problems etc

  78. Ge0rG

    lovetox: what about joining fewer MUCs in parallel? ;)

  79. lovetox

    also a solution, it falls into the category "reducing input"

  80. lovetox

    in my expierience its very easy to write inefficient UI code

  81. jonas’

    lovetox, re JIDs on the FS: the problem is not what's allowed, the problem is that a JID may be much longer than what a filename may be on ext4 (255 bytes) or even a full pathname may be on linux (it depends™)

  82. jonas’

    lovetox, re JIDs on the FS: on linux, the problem is not what's allowed, the problem is that a JID may be much longer than what a filename may be on ext4 (255 bytes) or even a full pathname may be on linux (it depends™)

  83. lovetox

    jonas’, i know there are multiple problems all not pretty, i just have no better idea, if i offer the user a export off all conversations, and want to put them in plain text, then one file per conversation makes sense

  84. jonas’

    only way to stay sane is sha256 | base32 on the JID if you need to truly key it on the jid and hope that that's hard enough to collide.

  85. lovetox

    is there a better solution to this?

  86. lovetox

    jonas’, but its plaintext export, people should see the filename and know about what it is

  87. lovetox

    not some sha256 string

  88. jonas’

    alternatively you could truncate the individual JID parts (thankfully, DNS also only allows 255 chars), so you could e.g. localpart[:128] + '@' + domainpart[:100] + '/' + resourcepart[:10] + sha256sum(jid).to_hex()[:5] or so

  89. jonas’

    or you might separate domainpart, localpart, resourcepart into a directory structure, which may also be much more navigatable anyway

  90. lovetox

    hm i like that idea on first look

  91. jonas’

    (that still requires truncating on long local- and resourceparts tho)

  92. lovetox

    im fine with a 99% solution here

  93. lovetox

    thanks, i think i will try that with the folders

  94. lovetox

    uh resource can also contain /

  95. pulkomandy

    hello, do you have any hints, comments, things to avoid, etc about nickname tab completion? for example: - any interesting algorithm to detect what substring of the input to try to complete? normally I would split the string on spaces, but xmpp nicknames can have spaces in them - how do you select which completion to use first when there are multiple matches? depending on who spoke last in the chat? depending on which tab completion was used previously for the same input? any other things to take into account? - really any feedback on how you did it and what makes oyu happy or unhappy about your current implementation

  96. Ge0rG

    pulkomandy: splitting on whitespace is generally fine, you'd only end up in conflict if you have multiple users with the same first word

  97. Ge0rG

    pulkomandy: it makes sense to order tab-completion by last-spoke _and_ last-mentioned

  98. Ge0rG

    poezio only does last-spoke, and I'm frustrated every time when messaging the same person multiple times back-to-back

  99. Ge0rG

    not sure if last-mentioned-by-you or -by-anybody though

  100. pep.

    Someone(tm) should push for a wire format to be used for this.

  101. pep.

    Converse is already using 372, dunno if that's the one, but something is badly needed

  102. Ge0rG

    pep.: to attach an occupant reference to a message?

  103. Ge0rG

    that's quite orthogonal

  104. pep.

    I think that's pretty much the deal. To me it's the same issue as 0071/0393. Randomly matching stuff in body

  105. pep.

    And not allowing the sender to express intentions

  106. pulkomandy

    yes, 372 can do that, I will add it to the message when tab completion is used to complete a nickname

  107. moparisthebest

    Last spoke is super annoying, if you've ever spent significant time in IRC you'll notice people constantly addressing the wrong person because that person spoke right before tab+enter

  108. Ge0rG

    moparisthebest: only if you have nothing to say

  109. Ge0rG

    I always do tab, then type a message, then press enter

  110. pulkomandy

    also, do you use any fancy data structure / algorithms for this (possibly a trie or something like that), or do you consider that the number of people in a room is low enough that it isn't worth it, and just compare strings one by one to find all the matches?

  111. pulkomandy

    also, case sensitive or not?

  112. Ge0rG

    case insensitive. good luck getting that right with unicode and each user's locales.

  113. pulkomandy

    well I don't care about other user's locale, I can just convert everything to lowercase for comparisons locally

  114. pulkomandy

    ah yes and there's also the case where someone has a nickname starting with [ or ` or some other silly char and I may want to ignore it for completion (or maybe I don't)

  115. pep.

    In poezio there's a user named ☭ sometimes. Looks like they've changed to another symbol today :)

  116. pep.

    In poezio@ there's a user named ☭ sometimes. Looks like they've changed to another symbol today :)

  117. nephele

    You don't need tab completion if you have only one char in your name :)

  118. nephele

    The only real vector I've had problems with is people namibg themselves /command something in group chats, depending on the input that can be an attack vector

  119. pep.

    Depends, let's say symbols are not often mapped on keyboards, and not everybody is into clicking

  120. pep.

    And being able to match that the nick has been mentioned is also necessary

  121. nephele

    It's not a completion if you have nothing to base it on, do you do tab conpletions with an empty buffer?

  122. pulkomandy

    probably won't have a special case to tab-complete cccp to ☭ , at least at first :p (I think that's how compose keys on Linux allow to use this character?)

  123. pep.

    nephele, sure

  124. pep.

    As Ge0rG said above, last-spoke and/or last-mentioned

  125. pulkomandy

    yes, empty buffer should allow to cycle through all nicknames. Preferrably the ones that are not reachable on the local keyboard first?

  126. pulkomandy

    and yes, that too

  127. nephele

    Compose key on linux (on X11) is based on a big list somewhere deep in X11 configs, I rewrote that config for myself once to make bindings that make sense

  128. moparisthebest

    Someone in another MUC changed their name to start with something that looked like an A but was not, so only people using conversations could address them, no one using tab-complete clients could

  129. lovetox

    7

  130. lovetox

    .

  131. mathieui

    pep., to be fair that user is usually called "ux" and gets annoyed anytime people mention them