Opened 5 years ago

Last modified 2 years ago

#1316 assigned task

I2P-Bote: File System Abstraction

Reported by: ExtraBattery Owned by: str4d
Priority: maintenance Milestone:
Component: apps/plugins Version: 0.9.13
Keywords: I2P-Bote performance Cc: HungryHobo, str4d
Parent Tickets:

Description

I noticed that the I2P router does a lot of disk io during startup, which seems to slow things down. I believe it's due to the fact that there are a lot of small files in the folder that contains my I2P application data.

As an example: My I2PBote plugin alone maintains 2,898 files(!), each only about 1,716 bytes large on average, amounting to less than 5 MB in total.

Obviously accessing lots of small files hurts performance. Wouldn't it be nice if the I2P router offered its core modules as well as its plugins some kind of own file system abstraction, where the module/plugin "sees" what appears to be a file system, while there is in fact just one large file ("container") on the disk, and where that one file is being dynamically resized as required? Each module/plugin would have its own container. This would allow the operating system to cache files much better and increase overall performance, especially on router startup.

This approach could also work around potential problems that are related to platform differences between operating systems that handle paths case-sensitive and those that handle them case-insensitive. Plus it could "sandbox" file system access to prevent accidental file io outside a module's folder on the actual file system. Another benefit would be that certain file access could be handled entirely in user-mode and would thus be faster than having to do expensive syscalls all the time. You could even allow a module to transparently encrypt its container with a password.

I hope you get the idea.

Subtickets (add)

Change History (6)

comment:1 Changed 5 years ago by zzz

  • Cc HungryHobo str4d added
  • Component changed from unspecified to apps/plugins

The majority are RouterInfo? files (one per peer) and second are peer profiles (one per peer).

  • We could combine the two (tricky because the two subsystems that use the two files are independent).
  • We could zip them all together (ok for profiles which are only read at startup and written at shutdown, but routerinfos are written periodically so it wouldn't work for that). Zipping also means a corruption loses all the files.
  • The profiles are gzipped now. That's why they are small.
  • Startup and shutdown is very slow on the Raspberry Pi. This is a possible cause. However, as you say a large number of files only "seems" to slow things down. We don't know.
  • All the above is for the router. I don't know how many files Bote uses or how it does so. One per email is a good guess though.
  • It's very tough to sandbox plugins or abstract/trap/prevent direct file system access. We explicitly reject a sandbox security model for plugins, it's way way too hard.

The way forward is profiling, logging, measurement to identify the true bottlenecks, then propose and experiment with improvements. You may wish to start a discussion with the Bote developers. Sometimes a simple fix like a BufferedInputStream? can work wonders. But gotta identify and measure the root cause first.

comment:2 Changed 5 years ago by ExtraBattery

  • I'm running on x86-64 PCs. The router shutdown is not slow at all, just the start. I notice that the CPU isn't under much load while starting the router, but the hard drive is very busy. So I thought file io is presumably the bottleneck in my case. It goes away after the OS has cached the necessary files, so if I shut down and start again, the second start is much faster.
  • Usually I would attribute this to lots of single files being accessed, as I'm not under the impression that the total amount of file content being read is large.
  • It could also be that the slow start is not due to the router, but due to the initialization of Java. It's hard to tell.
  • The majority of files in my I2P application data folder belongs to I2P-Bote (currently over 3,200 of I2P-Bote alone). The majority of files that belong to the router itself are in the folders "netDb" and "peerProfiles" (both together have about 1,500 files). The rest is just about a hundred of files.
  • I don't know what I2P-Bote uses all the files for. I have maybe a hundred mails, still thousands of files. I don't know if I2P-Bote really accesses them frequently.
  • I didn't mean a sandbox that protects from malice, but merely from accidents.

comment:3 Changed 4 years ago by zzz

  • Owner set to HungryHobo
  • Status changed from new to assigned

My comments above were regarding the files the router uses.

As comment 2 above, and the OP, reference I2P-Bote as the owner of the majority of the files, assigning to HungryHobo?.

comment:4 Changed 4 years ago by str4d

  • Keywords I2P-Bote performance added

comment:5 Changed 3 years ago by zzz

  • Owner changed from HungryHobo to str4d
  • Summary changed from File System Abstraction to I2P-Bote: File System Abstraction

comment:6 Changed 2 years ago by str4d

Migrated to https://github.com/i2p/i2p.i2p-bote/issues - I will close these tickets as things are resolved rather than right now, but please make future comments on GitHub?.

Note: See TracTickets for help on using tickets.