Opened 5 years ago

Last modified 5 years ago

#1415 new defect

I2PSnark filename conversion to builtin charset in windows may cause data loss

Reported by: DjJeshk Owned by: zzz
Priority: major Milestone: 0.9.20
Component: apps/i2psnark Version: 0.9.17
Keywords: filenames corruption i2psnark Cc:
Parent Tickets: Sensitive: no

Description

I2P version: 0.9.17-0
Java version: Oracle Corporation 1.7.0_71 (Java™ SE Runtime Environment 1.7.0_71-b14)
Wrapper version: 3.5.25
Server version: 8.1.16.v20140903
Servlet version: Jasper JSP 2.1 Engine
Platform: Windows XP x86 5.1
Processor: Core 2 (45nm) (core2)
Jbigi: Locally optimized native BigInteger? library loaded from file
Encoding: Cp1257
Charset: windows-1257

If torrent contains filename which contains a character which does not exist at system char set, it will be converted to

_

for example Sakın Bana Söyleme is converted to Sak_n Bana Söyleme.
If torrent contains two different filenames which converts to same filename it may cause unexpected behaviour including sending corrupted pieces to others or neverending torrents.

Reason: fixed width 1 byte long characters are in use which limits usable characters to 256 - 32 = 224 (first 32 characters are not allowed in filenames).

Solution: use bultin 2 byte long characters (wide chars, wchars, unicode characters) and leave one byte long character set for 1980s and 1990s, when Unicode was not implemented yet.

Subtickets

Change History (5)

comment:1 Changed 5 years ago by DjJeshk

Priority: criticalmajor

comment:2 Changed 5 years ago by zzz

Milestone: 0.9.170.9.18
Version: 0.9.160.9.17

Yes, duplicate names after character remapping is possible.

related: #571 #771 #1132

Has this happened to you on an actual torrent or is this a theoretical problem? If the former, please link to the torrent.

We can only create files using the character set that the platform supports. To fix, we will have to convert post-mapping duplicates to a unique name deterministically. It's not so simple as "using 2 byte characters".

comment:3 Changed 5 years ago by DjJeshk

Affected torrents:
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%3bRt%10%8f%b0R%c1%9c%cfr%12Y%80%07Q%dc%9c%e3%dc
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%0a%7b%87%a6L%e1l%5cG%2a%aa%27%cc%d9S%1aLy%867
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%2fd%7eD%ab%0b%3b%80%8aM%d5%fbr%f5S%bd%be%cf%c9%5b
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%b2c%dd6%1e%0a%fd%80%ac%0d%04%1c%b9%ba%d3%80%aaS%8e%bf
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%dd%b9%9d%ba%f1%d1%aa%07%e6%5d%00%2e5%3aV%17k%d7%db%ac
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%9a%b3%40%05U%3b%93%f0%d7%afM%07Yk%caw%96%27%60%22
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%d6%c9%2e%0a%2c%d3%eb%ce%03%2c%08%1f%c4%8f%eb%1a%94%f6J%a4
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%14%c1%df%f6%8d7%5b%5cY%1e%f5%8a%3f%9e%1c%7d%f7B%f9k
http://tracker2.postman.i2p/details.php?dllist=1&filelist=1&info_hash=%15q%c2%c6%b0f%fa8%90%ff%cc%0c%b4%f9%7e%12T%e4%98%14

These torrents were created and ran with Vuze (at 0.9.16 i2psnark had serveral problems that prevented to create torrents). I2P updated to 0.9.17 and i2psnark torrent creation started to work. I migrated torrents to i2psnark and some of them appeared incomplete. At filelist saw it has missing file named Sak_n Bana Söyleme and rendundant file named Sakın Bana Söyleme.

comment:4 Changed 5 years ago by zzz

Thanks.

This will be tricky to implement. Obviously we need to keep files in the same directory unique. We may or may not need to also keep directories unique. It could be acceptable to merge directories.

comment:5 Changed 5 years ago by zzz

Milestone: 0.9.180.9.20
Note: See TracTickets for help on using tickets.