XSF Discussion - 2018-10-03


  1. jonas’

    :D

  2. mightyBroccoli

    I would like to know if there are boundaries on how often, how in generel it is allowed to crawl the extensions website? I build a bot which is able to grab the info header from the each xep extension. But there are no threshholds on how often one could issue a request. Or maybe if theres a neat api I could use :)

  3. Zash

    robots.txt ?

  4. intosi

    Why don't you just use a local git clone for that?

  5. Zash

    or https://xmpp.org/extensions/xeplist.xml

  6. jonas’

    mightyBroccoli, if there’s anything more you need which isn’t in xeplist.xml, let me know

  7. mightyBroccoli

    I wasnt aware of the xml :) thats everything I need :) caching will do the reset :) thats even better then my bs4 garbage :D jonas’ it seems that the xml lists even the deferred xeps, are there some not listed in there?

  8. jonas’

    mightyBroccoli, the XML lists all the things

  9. jonas’

    even inbox

  10. mightyBroccoli

    ahh ok, thats nice. I guess the accepted tag could be used to redact the inbox xeps from the results

  11. jonas’

    exactly

  12. jonas’

    mightyBroccoli, use If-Modified-Since

  13. jonas’

    the webserver supports that

  14. Zash

    Why not ETag/If-None-Match ?

  15. jonas’

    or that

  16. jonas’

    firefox does both and gets 304

  17. Zash

    Both work exactly the same anyways

  18. jonas’

    probably

  19. jonas’

    depends on how the etag is generated

  20. Zash

    Which means you can use them as supercookies

  21. Zash

    I made a thing once that used the timestamp of the last request in If-Modified-Since. It doesn't work then.

  22. Zash

    It has to be exactly what the server sent in whatever header it was.

  23. mightyBroccoli

    I will try both ideas and use whatever is easiest and working :)

  24. vanitasvitae

    Hi! Who was running planet.jabber.org again?

  25. vanitasvitae

    ah found it 😀

  26. intosi

    Won't need to point you to ralphm then ;)

  27. ralphm hides

  28. vanitasvitae

    😀

  29. mightyBroccoli

    Zash, I am just wondering the xml you posted is actually invalid. XEP 225 talks about <domain> and <hostname> tags which are unescaped.

  30. Zash

    Who what when?

  31. Zash

    -ENOCTX

  32. mightyBroccoli

    It's like 5 messages up 😂

  33. Zash

    Then it's scrolled out of view and out of mind, flushed away by quitjoins

  34. Zash

    Also what

  35. Zash

    I don't see how what you just said has to do with xeplist.xml, if that's what you were talking about

  36. Link Mauve

    “It does not enable a component to bind multiple hostnames to one stream (as, for example, a client can bind multiple resource identifiers).”, a client can do that?!

  37. Link Mauve

    Is this the reason why @from exists on sent stanzas?

  38. Zash

    > <remark>Modified namespace to incorporate namespace versioning; clarified that the value of the &lt;hostname/&gt; element Are you just looking at the Firefox rendering of the XML?

  39. mightyBroccoli

    ok so the xeplist.xml lists all currently known xeps inbox and accepted. Inside in line 5958 this line is invalid. > Modified namespace to incorporate namespace versioning; clarified that the value of the <hostname/> element can be either <domain> or <domain/resource>. I did requests.get the xml and read it line directly from that no browser involved

  40. Zash

    $ curl https://xmpp.org/extensions/xeplist.xml | grep -o '.......hostname........' he &lt;hostname/&gt; el

  41. Zash

    I think you're holding it wrong

  42. Zash

    Or you're printing the text content, after unescaping.

  43. mightyBroccoli

    damn i found the error. I need to reply.content.decode to get the correct format :) thank you though :)

  44. mightyBroccoli

    etag header is way easier then Last-Modified header. I choose etag and it works like a charm :) thanks :)