-
SamWhited
This is kind of nifty if true: https://twitter.com/Midar3/status/839059229289943041
-
SamWhited
(TL;DR — libstrophe is listed in the Nintendo Switch's open source license credits)
-
jonasw
libstrophe. nice :)
-
Ge0rG
and somebody didn't bother enough to rotate that picture.
-
Tobias
heh
-
jonasw
Ge0rG: you can rotate your screen, can’t you :P
-
Ge0rG
jonasw: I tried, but it turned out to be attached to the laptop body.
-
jonasw
xrandr --output $OUTPUT --rotate left
-
Ge0rG
alternative version: I did rotate it, but then the desktop manager autorotated it back.
-
SamWhited
Tangentially related: I didn't realize Jack Moffitt worked for Mozilla or was in charge of Servo these days; that's fantastic. I wish he'd revamp libstrophe in Rust.
-
SamWhited
or maybe I did realize that, since I apparently follow him on a bunch of Rust stuff, but didn't realize he was the same person who wrote libstrophe.
-
Tobias
SamWhited, similar thing with some of Cisco's original jabber devs :)
-
Zash
I wouldn't really expect Mozilla to do anything with XMPP.
-
Tobias
they probably went there for non-XMPP related things
-
SamWhited
yah, I once got asked at a conference by someone on the Hello team (or whatever that short lived firefox messenger was) what the point of XMPP or using standards was; I dunno about the rest of Mozilla, but I more or less gave up on the Firefox team then and there.
-
Tobias
Zash, although they should do more with it
-
jonasw
SamWhited: wtf
-
SamWhited
I think his exact words were "why would anyone bother using standards?"
-
jonasw
wtf
-
jonasw
have they even SEEN internet explorer?
-
SamWhited
Granted, I doubt he's representative of the rest of the Firefox people given their involvement with all things web-standards related; maybe that was just the Hello team.
-
Tobias
jonasw, come on...with web assembler you can finnally render your IE6 pages they way they are supposed to look on every platform :)
-
jonasw
Tobias: you make me sad.
-
Zash
I wonder if they still remember that "The Internet" does not equal "The Web"
-
Zash
SamWhited: https://www.mozilla.org/en-US/about/manifesto/#principle-06
-
SamWhited
Zash: Hah, I should have just pointed him at that; thanks.
-
jonasw
gahaha
-
jonasw
slap ’em in the face with that manifesto
-
Tobias
Zash, nah...just use Hello
-
jonasw
those are the same people who didn’t fight (enough) against WebDRM
-
SamWhited
They fought a lot, they just didn't win.
-
Tobias
Zash, or allo https://twitter.com/burnflare/status/838966485011685376 :)
-
SamWhited
I don't think it's fair to say they didn't fight enough; they were against it all the way through.
-
jonasw
SamWhited: okay
-
jonasw
I admit I haven’t followed it in detail, but what I saw from news coverage it didn’t seem too great.
-
arc
wow, i am just now realizing how much work there is to be done
-
arc
anyone here touched xml regex?
-
Tobias
XML regex?
-
arc
yes
-
Zash
arc: Wait for it
-
jonasw
what the heck is XML regex
- Zash prepares for the obligatory Zalgo reference
-
SamWhited
I'm about to have to flip my table, aren't I?
-
arc
http://www.xmlschemareference.com/regularExpression.html
-
jonasw
https://stackoverflow.com/a/1732454/1248008
-
Tobias
SamWhited, oh you have some of those new flipping desks
-
jonasw
SamWhited: relevant for tableflip: https://www.youtube.com/watch?v=eob7V_WtAVg
-
arc
its a method to constrain acceptable string values within xml
-
SamWhited
Tobias: Nah, they went for the sit-stand ones, but wouldn't spring for the flipping ones
-
jonasw
arc: seems like a regex variant used by XML?
-
arc
im still reading into it, but yes. thankfully its a simplified variant
-
arc
EXI uses it for strings.
-
Zash
Why and where?
-
arc
all strings.
-
arc
CH notably
-
arc
without it, an unconstrained string in the schema is transmitted as an unsigned int per char representing the unicode codepoint. no UTF-8
-
Zash
Shouldn't most strings fit in either enums or user-provided strings with no restrictions?
-
arc
oh yes. but you'd be insane to do so
-
Zash
Whta
-
Zash
No UTF-8?
-
arc
no UTF-8
-
Zash
KILL IT WITH FIRE
-
arc
not as far as ive found. admittedly ive just started
-
arc
that was my gut reaction too. however the more i read into this, the more i understand why.
-
jonasw
why would one want to do that?
-
SamWhited
> representing the unicode codepoint
-
SamWhited
Yup, now my desk and things are all over the floor. Saw it coming.
-
Zash
So, UTF-32
-
jonasw
SamWhited: did you do it with the excellent stare of Alan Rickman?
-
arc
... yes. but as i said, you'd be insane to not use the regex to restrict the character map
-
SamWhited
jonasw: No, I am nowhere near that fantastic; that was amazing.
-
jonasw
arc: so assuming this is used for standard desktop clients, I either have to restrict what codepoints users can use or the text is blown up to factor two to four of the bytes needed with UTF-8?
-
arc
unfortunetly the EXI spec doesn't go into deep detail on this, it refers to other documentation on xml regex, but it appears with bitpacked encoding you can compress it down a lot better than UTF-8
-
arc
jonasw: nope. you can craft a method to support the entire breadth of unicode in a much tighter format than UTF-8 because you're no longer constrained to byte boundaries.
-
jonasw
mhm
-
Zash
Like, huffman code
-
arc
i wouldn't go that far with it.
-
arc
im trying to track down whether a codepoint can represent multi-character sequences now.
-
arc
i would not be suprised.
-
arc
unlike using DEFLATE tho, this would not be dynamic, but encoded as part of the schema.
-
SamWhited
Define "multi-character sequences?"
-
arc
i mean, you could allocate the values 128-255 to represent the 127 most common words in the english language
-
arc
i do not know if this is true yet or not.
-
Zash
Well, Hangul?
-
SamWhited
You could probably do that, you won't be able to do that for all languages though
-
arc
ive only been reading into this for the past 2 hours.
-
SamWhited
Not without canonicalizing inputs first
-
jonasw
SamWhited: but as far as I understand it, your client could choose a schema specialised for the locale you’re using
-
arc
SamWhited: since the client dictates the schema, the client could adjust this per selected language for the user
-
jonasw
wee, I understood EXI \o/
-
arc
also, there's nothing stopping you from using 9 bits
-
arc
im just commenting that we have a shortcoming in the XEP schemas. strings can and should be validated
-
arc
also this could be extremely useful for client-side data forms validation
-
SamWhited
But let's say one of those words is "café"; is that caf + Unicode character LATIN SMALL LETTER E WITH ACUTE (U+00E9) or cafe + Unicode character COMBINING ACUTE ACCENT (U+0301)?
-
Zash
Are there really strings that are not logically enums, while being user controlled?
-
arc
before 8:30am this morning i wasnt aware that XML regex even existed, so my understanding is still very crude, and further what subset of this applies to EXI
-
SamWhited
Unicode provides ways to do canonicalization of things like that, you'd just have to make sure you were doing it before building the string table and to any words you compare against the string table
-
arc
SamWhited: *IF* this supports multi-character sequences, and not simply constraining which unicode codepoints are acceptable in regex format, then its whatever is defined in the schema. but this is entirely separate from the string table provided by EXI.
-
Zash
That way lies madness
- Zash points in the general direction of Unicode
-
SamWhited
Oh, I thought this had something to do with the string table. Either way, if you're searching for things, you'll need to do canonicalization too if you want to actually find things (since the same thing may be encoded differently on different machines, but be the same as far as the user is concerned)
-
arc
SamWhited: yes, on the machine side this is evaluated to a unicode codepoint per character in any case.
-
arc
goodbye char*
-
arc
goodbye stdlib
-
arc
goodnight #import "string.h"
-
jonasw
hey, were will I get my memset from, arc? ;)
-
arc
jonasw: heh
-
Zash
jonasw: dd in=/dev/mem of=/dev/mem start=x count=y
-
Zash
or was it seek=
-
jonasw
*blink*
-
jonasw
I’m done for today.
-
arc
LOL
-
jonasw
Zash: also, that’s memcpy, not memset.
-
arc
Ok I'm now officially over G+
-
arc
having the alert bubble show a new message while on a google search, and see people arguing implied consent for sexual penetration by the TSA by their choice to fly... break time.
-
jonasw
wat
-
arc
https://plus.google.com/+JohnWarthog9Hawley/posts/LEvErfQnajc
-
Zash
off topic much?
-
arc
that's the problem with G+
-
arc
it shows up in google searches for unrelated things.
-
Tobias
i'm sure that's not standard behavior
-
arc
anyway. yes im starting to suspect that the way this works, multi-character sequences can be implied, but it might be even more devious. more like a smartphone dictionary predictor
-
arc
if you have both the letter "c" and a number of whole words that "c" could be grouped properly, you could resolve whole words in a minimum number of bits. and that can be optimized by the client in the chosen schema
-
jonasw
arc: so basically the string is encoded by the states of a regex automaton which gets the string fed as input?
-
arc
I think so.
-
arc
actually i should go back to what i did in the early days with this work, grab the reference implementation and try some things on it, then read the bits
-
jonasw
clever and devious at the same time
-
arc
you might even be able to, if you are very clever, recreate UTF-8 using an XML regex.
-
arc
that's not even work, that'd be pure joy for some weekend.
-
Zash
Wat
-
arc
well remember that the top bit of UTF-8 determines whether its a 1-byte or multi-byte sequence. and if the first byte has bit 128 set, then the next byte will have the top two bits set appropriately to show a continuation, etc
-
arc
if you are very very clever, and if this works the way im starting to understand, then you could build a regex that recreates UTF-8 precisely such that the string value encoded by EXI would be precisely UTF-8
-
arc
such that if you encoded EXI byte-aligned, and you read the raw stream, you would find the UTF-8 encoded strings within
-
arc
it might not be possible but im fairly certain it is, because the bits in UTF-8 are always meaningful, you would just have to nest your atoms appropriately.
-
arc
but UTF-8 encoding is a hack for ascii backwards compatibility, i believe in almost every case you could craft a better one. which is kind of cool if you think about it, even with a limited dictionary, Zipf's Law will ensure extremely tight compression, and without the encryption concerns
-
arc
https://www.youtube.com/watch?v=fCn8zs912OE
-
Zash
Out of all Unicode related things, UTF-8 is the last thing I'd complain about
-
arc
oh im not complaining about UTF-8. i love UTF-8. but I can see now why UTF-32 was acceptable.
-
Zash
Why not UTF-64? Surely it'll be more efficient on modern machines ;)
-
arc
heh
-
arc
i think actually most uses of this would be about as fast as UTF-8 decoding
-
arc
a very simple regex could be something like """[\p{BasicLatin}|(he)|(se)|(re)|(hat)|.]*"""
-
arc
EXI usually follows schema semantics literally, so i would assume 3 bits would be used to determine whether its a chr(0:127), one of the four provided common word segments, or a full unicode character
-
arc
"The" would be then be encoded as "000 1010100 001"