Opened 9 years ago
Closed 8 years ago
#436 closed defect (fixed)
Platform dependent encoding instead of UTF-8 in DataHelper
Reported by: | John Doo | Owned by: | zzz |
---|---|---|---|
Priority: | minor | Milestone: | 0.8.6 |
Component: | api/data | Version: | 0.8.4 |
Keywords: | Cc: | ||
Parent Tickets: | Sensitive: | no |
Description
net.i2p.data.DataHelper? contains the following lines:
private final static byte SEMICOLON_BYTES[] = ";".getBytes(); in UTF-8
private final static byte EQUAL_BYTES[] = "=".getBytes(); in UTF-8
As the comment already says, the byte-arrays are supposed to contain an UTF-8 encoding of ';' and '=', but actually they contain an platform-dependent encoding. So I suggest to replace it by:
private static final Charset UTF_8 = Charset.forName("UTF-8");
private static final byte[] EQUAL_BYTES = "=".getBytes(UTF_8); in UTF-8
private static final byte[] SEMICOLON_BYTES = ";".getBytes(UTF_8); in UTF-8
[Using a (final static) Charset has the advantage that's not necessary to put getBytes in a try-catch-clause.]
Subtickets
Attachments (1)
Change History (7)
comment:1 Changed 9 years ago by
comment:2 Changed 9 years ago by
Among the charsets are IBM-Thai, IBM01140-IBM01140, UTF-16 and its variations, and several more. As I understand it, UTF-8 incorporates ASCII so the first 128 characters are the same. But apparently that's not true for all available character sets.
I have attached a small program that prints charsets in which the byte-values of '"' and '=' are different to those in UTF-8.
Changed 9 years ago by
Attachment: | CharsetTest.java added |
---|
comment:3 Changed 9 years ago by
OK, you got me there.
I suspect there are a TON of places in the code where we assume that the local charset encoding is the same as 7-bit ASCII (ISO-8859-1) for the characters 0x20 - 0x7e, as well as \r and \n probably. I expect anybody running I2P on a charset where that is not the case would have it fail in strange and non-obvious ways. The good news is that the encoding is part of the version information listed on /logs.jsp so if anybody reports something unusual we can check.
So it's something to look for elsewhere in the code also.
comment:5 Changed 9 years ago by
Milestone: | 0.8.5 → 0.8.6 |
---|---|
Owner: | set to zzz |
Priority: | major → minor |
Status: | new → accepted |
I've fixed DataHelper? in my naming branch, to be propped over for 0.8.6, but the general issue of non-ASCII-friendly encodings will require a big audit/fix. Entering new ticket #457 for that.
comment:6 Changed 8 years ago by
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
This was fixed in 0.8.6 but the general issue now in ticket #457 is not yet addressed.
but '=' and ',' are part of the base ASCII charset that are the same in any character encoding, right? Can you name a locale where they are encoded differently?