Opened 4 years ago

Last modified 4 years ago

#1788 open enhancement

Optimizing throttling to make better use of router bandwidth within user-specified limits

Reported by: foible Owned by:
Priority: minor Milestone: undecided
Component: router/general Version: 0.9.25
Keywords: performance, scaling, bandwidth, throttling, privacy, anonymity Cc:
Parent Tickets: Sensitive: no

Description

I have noticed that I2P's total bandwidth use tends to vary considerably, roughly on a ten-minute cycle.[1] Considering this, it did not make sense to me, since in my case almost all bandwidth is used for participating traffic, and it is spread over hundreds of participating tunnels at once. The sheer number of tunnels should create something like a smooth average of total bandwidth usage.

When I looked at my router's graphs, this was clearly not the case, though. On average, every five minutes, my router's total and participating bandwidth usages switch between about 120% of my specified bandwidth limit and 60% of my specified bandwidth limit, with decent regularity.

Staring at the graphs for a long time, it occurred to me that this pattern might be produced if the router is throttling (rejecting due to b/w limit) new participating tunnel requests from other peers *too* aggressively. The result is that a helpful peer is only actually relaying about 75-80% of the bandwidth that they could be, on average (at least on my machine). Obviously, if we could gain even half of that "wasted" bandwidth, it would be a big win for the network.

There is already *some* flexibility in these rejections, judging by the different messages "Rejecting most tunnels: bandwidth limit" and "Rejecting tunnels: bandwidth limit," but I suspect that it might not be tuned quite right, and I have a very rough idea to improve this throttling:

What if the router rejects X% of incoming tunnel requests, where

X = ([percentage of b/w limit currently used] - 90) * 10

?

Or perhaps * 15 would be better? 8? Or even some compound or logarithmic function if we want to be fancy? Obviously these are arbitrary numbers, and they would need to be thought about, tested and tweaked, but I believe that giving the bandwidth throttling a bit more nuance, it may be possible to keep most routers successfully using 95%+ of their bandwidth limit, with far fewer "spikes."

Smoothing out the total bandwidth usage also improves anonymity, potentially significantly, as the adaptive throttling should tailor itself to complement i2p traffic initiated by the local user(s), as well. This makes something like the start of a download or upload much less noticeable, potentially, to an attacker that is able to observe and graph a router's bandwidth usage. Such adversaries are certainly a reasonable and significant part of our threat model, as ill-defined as that model may be.

What are your thoughts?

Subtickets

Attachments (1)

bwgraph20-6d00.png (75.9 KB) - added by ReturningNovice 4 years ago.
Graph from OP

Download all attachments as: .zip

Change History (4)

Changed 4 years ago by ReturningNovice

Attachment: bwgraph20-6d00.png added

Graph from OP

comment:1 Changed 4 years ago by foible

Looking into this further, it seems very feasible to improve. There are some challenges, though, like the fact that new tunnels take some time to pass traffic, so the router must "predict" a couple minutes into the future based on its current bandwidth usage. But this can be done (not literally, but we can improve the approximations it's already using).

The relevant code is in RouterThrottleImpl?.java, primarily on lines 389-391:

    // limit at 90% - 4KBps (see above)
    float maxBps = (maxKBps * 1024f * 0.9f) - MIN_AVAILABLE_BPS;
    float pctFull = (maxBps - availBps) / (maxBps);
    double probReject = Math.pow(pctFull, 16); // steep curve 

There is also a cap on new tunnels if bandwidth usage is > 90% on line 360, which I refer to as a "soft" limit, below.

I ran some crude, approximate numbers on this function to get a rough "feel" for router rejection behavior, not counting the MIN_AVAILABLE_BPS (which is 4kBps) that's cut off of the top:

80% of the 90% "soft limit" (72% of user-set b/w limit used) yields 3% rejection
85% of the 90% "soft limit" (76.5% of user-set b/w limit used) yields 7% rejection
90% of the 90% "soft limit" (81% of user-set b/w limit used) yields 18.5% rejection
95% of the 90% "soft limit" (85.5% of user-set b/w limit used) yields 44% rejection
98% of the 90% "soft limit" (88.2% of user-set b/w limit used) yields 72% rejection
99% of the 90% "soft limit" (89.1% of user-set b/w limit used) yields 85% rejection
99.5% of the 90% "soft limit" (89.55% of user-set b/w limit used) yields 92% rejection

Anything above that goes right to 100% rejection. So basically, the rejection decision as it is now looks a bit like this when graphed (NOT TO SCALE!!):

 
     r |                                      ____|__100%
     e |                                     /    |
     j |                                    /     |
     e |                                   ;      |
     c |                                   |      |
     t |                                   ;      |
       |                                  |       |
     % |__________________________________;_______|
       0%           bandwidth usage              100%

I'm pretty sure that what I noticed in my router bandwidth graph (that started this whole process, and can be seen in the attached image) is a result of a couple of things - that big, very vertical wall you see here, and the wide area of 100% rejection even when the router still has some capacity and is constantly expiring tunnels anyway.

So, I see three goals for tuning, here:

  1. Make the existing transition from "few tunnels rejected" to "nearly all tunnels rejected" (the "vertical wall") occur over a wider span, to help prevent very sudden drop-offs in participating traffic.
  1. Make rejecting 100% of tunnels a less common state - the router graphs (like in the earlier attached image) clearly show that this is over-compensating (to me).
  1. Start rejecting a small number of tunnels at lower bandwidth usage %s, to help prevent "biting off more than we can chew" and ending up with too many tunnels (current setup already achieves this pretty well, but mostly by being very conservative). This may be optional, but may become necessary depending on how aggressive the other tweaks are.

Fortunately, the way this code is written now, it's very tunable, and very smart. I have done some preliminary experiments with reducing the exponent (from its current 16) and believe that this is a promising avenue to pursue.

However, it is my belief that an exponential function may not actually be the best way to determine this throttling at all, as pretty as it is. I think for our purposes, a simple, straight slope, still with some 100% rejection states (like above 95% instead of above 90%) would work much better, but of course this must be tested. So in such a case, the graph would look like this:

    
     r |                                        __|__100%
     e |                                      _/  |
     j |                                    _/    |
     e |                                  _/      |
     c |                                _/        |
     t |                              _/          |
       |                            _/            |
     % |___________________________/______________|
       0              bandwidth usage            100%

Except the slope would not be wavy, of course.

If it is to remain exponential, though, I think it seems better with an exponent of six or less, and potentially even as low as two.

Naturally all of this conjecture is meaningless without testing, since an i2p router is a very chaotic thing, and there are also feedback considerations involved in this process, where bandwidth-shaping parameters affect future bandwidth which affects future bandwidth-shaping parameters and so on… so we need some real data to see what works. But thankfully, it seems that that data should be relatively easy to get, and doesn't require anyone other than the person doing the testing to do anything. What I hope to find is a configuration where total bandwidth usage remains mostly around 95-99% of the user-specified maximum, with minimal variance. I think this is achievable, but of course I'd settle for 88%-104%, just as an example, which would still be an improvement.

Everyone and anyone else is of course welcome to tweak the code and do some testing and/or maths of their own. Let's see if we can find something that works. Also, it would be good to re-work these numbers a bit without disregarding the MIN_AVAILABLE_BPS, especially at various total bandwidth limits, since it's a fixed value. Finally, this kind of problem is probably pretty well-suited to a plain Monte Carlo simulation, or at least a three- or four-dimensional graph of possibilities, to determine good potential configurations, but I do not have the skills to create either of those.

Please comment with any thoughts you might have below.

comment:2 Changed 4 years ago by foible

Another thought that I've just had is that the 90% usage soft limit might be a better fit in routers that are sharing less bandwidth than mine, and conversely, that a 95% cutoff might still be too conservative in very high-throughput routers. So another thing to consider would be either changing those cutoffs to be absolute in kBps (like by lumping them in with MIN_AVAILABLE_BPS, and having the slope go all the way to "100%" of bandwidth usage instead of 95%), or changing that "soft limit" percentage to change/scale on its own at various total bandwidth limits. Just another small piece of this to think about.

It would be great to hear from people with routers that are different than my own, to get a sense of how their bandwidth graph looks (and how much potential bandwidth is "wasted"), and how it might look with some tuning done.

Last edited 4 years ago by foible (previous) (diff)

comment:3 Changed 4 years ago by zzz

Status: newopen

The attached graph makes it pretty obvious that we could be more efficient. I'm not worried about where we set the cap - whether it's at 95, or 98, or 100, or 105% of the setting - but it would be nice to stay roughly at a cap rather than swing wildly. So I'm in agreement with a goal of, e.g., 88-104 as you state above.

There's a couple of dozen places in the code where we can reject a tunnel. I've tried to word each one slightly differently, so you can keep an eye in the console and trace it back to the code. You can also set some logging levels judiciously, that may help. There are - indeed - places in the code that are only hit by very high bandwidth routers, that only a few people can test changes for.

I encourage you to experiment with code changes and see what improvements you can come up with. I'm certainly open to accepting well-reasoned and well-tested patches. Sounds like you're on the right track. I don't have an opinion on the details, esp. the exponential vs. straight-line hypothesis, you may only get to an answer by testing. Also, recall that how the requesting routers respond - do they back off from more requests, and how long does it take - is a factor to consider.

Note: See TracTickets for help on using tickets.