-
jonas’
:D
-
mightyBroccoli
I would like to know if there are boundaries on how often, how in generel it is allowed to crawl the extensions website? I build a bot which is able to grab the info header from the each xep extension. But there are no threshholds on how often one could issue a request. Or maybe if theres a neat api I could use :)
-
Zash
robots.txt ?
-
intosi
Why don't you just use a local git clone for that?
-
Zash
or https://xmpp.org/extensions/xeplist.xml
-
jonas’
mightyBroccoli, if there’s anything more you need which isn’t in xeplist.xml, let me know
-
mightyBroccoli
I wasnt aware of the xml :) thats everything I need :) caching will do the reset :) thats even better then my bs4 garbage :D jonas’ it seems that the xml lists even the deferred xeps, are there some not listed in there?
-
jonas’
mightyBroccoli, the XML lists all the things
-
jonas’
even inbox
-
mightyBroccoli
ahh ok, thats nice. I guess the accepted tag could be used to redact the inbox xeps from the results
-
jonas’
exactly
-
jonas’
mightyBroccoli, use If-Modified-Since
-
jonas’
the webserver supports that
-
Zash
Why not ETag/If-None-Match ?
-
jonas’
or that
-
jonas’
firefox does both and gets 304
-
Zash
Both work exactly the same anyways
-
jonas’
probably
-
jonas’
depends on how the etag is generated
-
Zash
Which means you can use them as supercookies
-
Zash
I made a thing once that used the timestamp of the last request in If-Modified-Since. It doesn't work then.
-
Zash
It has to be exactly what the server sent in whatever header it was.
-
mightyBroccoli
I will try both ideas and use whatever is easiest and working :)
-
vanitasvitae
Hi! Who was running planet.jabber.org again?
-
vanitasvitae
ah found it 😀
-
intosi
Won't need to point you to ralphm then ;)
- ralphm hides
-
vanitasvitae
😀
-
mightyBroccoli
Zash, I am just wondering the xml you posted is actually invalid. XEP 225 talks about <domain> and <hostname> tags which are unescaped.
-
Zash
Who what when?
-
Zash
-ENOCTX
-
mightyBroccoli
It's like 5 messages up 😂
-
Zash
Then it's scrolled out of view and out of mind, flushed away by quitjoins
-
Zash
Also what
-
Zash
I don't see how what you just said has to do with xeplist.xml, if that's what you were talking about
-
Link Mauve
“It does not enable a component to bind multiple hostnames to one stream (as, for example, a client can bind multiple resource identifiers).”, a client can do that?!
-
Link Mauve
Is this the reason why @from exists on sent stanzas?
-
Zash
> <remark>Modified namespace to incorporate namespace versioning; clarified that the value of the <hostname/> element Are you just looking at the Firefox rendering of the XML?
-
mightyBroccoli
ok so the xeplist.xml lists all currently known xeps inbox and accepted. Inside in line 5958 this line is invalid. > Modified namespace to incorporate namespace versioning; clarified that the value of the <hostname/> element can be either <domain> or <domain/resource>. I did requests.get the xml and read it line directly from that no browser involved
-
Zash
$ curl https://xmpp.org/extensions/xeplist.xml | grep -o '.......hostname........' he <hostname/> el
-
Zash
I think you're holding it wrong
-
Zash
Or you're printing the text content, after unescaping.
-
mightyBroccoli
damn i found the error. I need to reply.content.decode to get the correct format :) thank you though :)
-
mightyBroccoli
etag header is way easier then Last-Modified header. I choose etag and it works like a charm :) thanks :)