-
MattJ
Sigh
-
MattJ
I think for XEP-0335 I received conflicting feedback :)
-
MattJ
One request was to drop all mention of character encoding and anything below the "stream of XML characters" layer, and the other was a request to add details of characters that need to be XML-escaped
-
flow
the former seems sensible
-
flow
converting an arbitrary unicode string into its xml representation is not something you should cover in an arbitrary xep
-
flow
it's something the the xmpp-core rfc covers by saying "we put XML encoded in UTF-8" on the wire
-
Zash
What about arbitrary Unicode? JSON is weird and requires that to be escaped to \uXXXX IIRC
-
flow
it would be different if we talk about codepoints that re not allowed in xml 1.0
-
MattJ
and XML forbids some characters (even escaped) that JSON allows
-
MattJ
so I don't think it's as simple as "don't mention it"
-
Zash
And probably in UTF-16 surrogate pair mode
-
flow
Zash, for those the escaping happens on the json layer, before the xmpp library sees it and transformes the unicode string to xml, no?
-
Zash
Sure, yeah
-
flow
MattJ, are those characters kept in their naturual representation in json or, in the escaped representation?
-
Zash
MattJ: hm?
-
flow
if the latter, then we are fine, if the former, then you have to either define your custom escaping scheme, or convert the json string to base64
-
Zash
Bring out the Venn diagrams!
-
MattJ
https://mail.jabber.org/pipermail/standards/2019-February/035796.html
-
flow
Zash, how does the choosen unicode encoding (UTF-8, UTF-16) matter?
-
Zash
Does it say somewhere that an encoded JSON thing is ASCII?
-
flow
isn't json, just like xml, at first only a sequence of unicode code points?
-
Zash
Aren't JS (and hence JSON) strings by definition UTF-16?
-
flow
but then they are also a sequence of unicode codepoints
-
Zash
But the encoded JSON is either clean ASCII or ???
-
flow
I guess the encoded JSON is whatever encoding you aggreed on to exchange the sequence of codepoints✎ -
flow
I guess the encoded JSON is whatever encoding you agreed on to exchange the sequence of codepoints ✏
-
Zash
> JSON syntax describes a sequence of Unicode code points. http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
-
flow
it appears to me that json strings must contain e.g. control characters in their escaped form only
-
flow
All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
-
flow
RFC 7159 § 7
-
flow
so if you feed a json string to an xml library, the xml library should never see e.g. U+0010, but only the sequence of codepoints that composes the escape sequence of U+0010
-
Zash
Seems so
-
Zash
But waqas refers to https://tools.ietf.org/html/rfc7159
-
Zash
Why have one definition of JSON when there can be three?
-
MattJ
The XEP refers to https://tools.ietf.org/html/rfc4627 which was obsoleted by 7159, I guess I'll update that
-
Zash
Or .. twentyeleven.. probably about as many as there are JSON implementations
-
Zash
https://tools.ietf.org/html/rfc7159#section-7 > unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
-
Zash
Huh, where did I see the thing about surrogate pairs then?
-
MattJ
If someone wants to tell me what characters are unrepresentable in XML, please add to the list thread I just revived and I can include it in the next revision
-
Ge0rG
there are characters that can't be represented in XML?
-
MattJ
According to Waqas. I don't see them at a glance
-
MattJ
JSON characters, that is
-
MattJ
Some control characters are unrepresentable in XML 1.0
-
MattJ
It looks to me like they are also forbidden in JSON
-
Link Mauve
Seems like they have a Wikipedia article: https://en.wikipedia.org/wiki/Valid_characters_in_XML
-
flow
MattJ, forbidden as in, they have to be escaped in JSON, right?
-
MattJ
Yes
-
MattJ
or
- MattJ looks at the RFC again
-
flow
I guess you have to check if any of those characters
-
flow
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
-
flow
is invalid/forbidden in XML 1.0
-
MattJ
Yes, I don't see anything forbidding escaped things in JSON
-
flow
Any character may be escaped.
-
flow
say RFC 7159 (the current JSON RFC it appears)✎ -
flow
says RFC 7159 (the current JSON RFC it appears) ✏
-
flow
so a naive but safe json to xml converter simply escapes every character ;)
-
flow
this errata, although rejected, may be of relevance: https://www.rfc-editor.org/errata/eid3984
-
jonas’
MattJ, note that XML forbids control characters, even in escaped form
-
jonas’
not that it matters for the JSON usecase
-
jonas’
(here, "escaped form" means hex-entity-encoding)
-
MattJ
jonas’: I know, that's why I chose "unrepresentable"
-
stpeter
FYI, we just received our GSoC payment. Thanks to everyone who contributed to this year's summer of code!