Opened 9 days ago

Last modified 2 days ago

#2617 new enhancement

Speedup outbound I2CP

Reported by: jogger Owned by: zzz
Priority: major Milestone: undecided
Component: router/general Version: 0.9.42
Keywords: Cc:
Parent Tickets: Sensitive: no


While working on the UDP transport, I noticed that messages are trickling in slowly all the time, making the router work packet-at-a-time. For inbound this is unavoidable, given the speed of todays CPUs. However for Tunnel GW Pumper zzz claimed there was batching. Can´t work on inbound but reasonable for outbound.

Running with #2432 comment 11 I saw I2CP reader spending 40% more CPU creating a message than the pump() needed for it. Thus the pumper queue could never get loaded which would need at least 3 messages for batching to work.

One could put message creation into a thread pool to parallize (the use case being 16 consecutive messages for a bittorrent chunk). However there has been already a solution in ClientMessagePool?:

            if (true) // blocks the I2CP reader for a nontrivial period of time

Change that to false and we have a valid use case for multiple job queue runners. Checked that in my test bed and number of invocations for Tunnel GW Pumper went down. Further checked that those jobs ran short enough not to be preempted on 1.8 GHz ARM32 (in which case # of threads must be limited to # CPUs).

So I came up with 16 job queue runners for the above use case together with #2432 comment 11. Result in my test bed:

  • Tunnel GW Pumper runs 20-30% less frequently than I2CP Reader, so batching works
  • As message creation is distributed across several threads there is no more preemption which occurs frequently for I2CP Reader looping over input with current code.

To make our torrent friends really happy one could as a followup tweak the UDP transport with a minimum send window of 16*1033 bytes and MAX_ALLOCATE_SEND = 16.


Change History (2)

comment:1 Changed 9 days ago by jogger

Addition: The batching also carries over to other threads like NTCP Pumper and NTCP Writer that loop less frequently, saving some CPU.

Furthermore: This mod just gave me > 1 MBps torrent output in the testbed with 4*1.8 GHz cores. Such speed I had only seen before with 8 cores (4*2.0, 4*1.5).

Last edited 9 days ago by jogger (previous) (diff)

comment:2 Changed 2 days ago by zzz

Component: router/transportrouter/general

This change was made in 2005 by jrandom with the message:

      Adjust the router's queueing of outbound client messages when under
      heavy load by running the preparatory job in the client's I2CP handler
      thread, thereby blocking additional outbound messages when the router is

Would changing this back to being on the job queue be of benefit, independent of #2432? Would we need to increase the job runner count (again, assuming we haven't implemented #2432) ?

Note: See TracTickets for help on using tickets.