Changes between Version 4 and Version 5 of petconpaper

Feb 26, 2009 5:01:33 PM (11 years ago)


  • petconpaper

    v4 v5  
    66PET-CON 2009.1
    8 DRAFT 2009-01-19
     8DRAFT - last update zzz feb. 26 17:00 UTC
    5959        <li> Re-sort into tiers every 30s
    6060        <li> Stats are persistent across restarts
     64Peer selection within I2P is simply the process of choosing which routers on the network to route locally-generated messages (and replies to those messages) to go through. In other words, which peers are used for tunnels. To accomplish this, the router maintains a database of each peer's performance, called the "profile". The router uses that data to estimate the bandwidth of a peer, how often a peer will to accept tunnel build requests, and whether a peer seems to be overloaded or otherwise unable to reliably perform.
     66The profile contains several statistics, continuously updated. For each statistic, averages, event counts, and maximums for 1 minute, 10 minute, 1 hour, and 24 hour periods are available. The profiles are also stored locally on disk, so that they are persistent across restarts. Also, the profile stores several timestamps, such as the last time a peer was heard from, the last time it accepted a tunnel request, and many more.
     69Each peer has a set of data points collected about them, including statistics about how long it takes for them to reply to a network database query, how often their tunnels fail, and how many new peers they are able to introduce us to, as well as simple data points such as when we last heard from them or when the last communication error occurred.
     72Much, if not most, of the profile data are unused in the current software. It remains a "reserve" defense that can be easily used to enhance the router's resistance to theoretical or actual attacks, such as denial-of-service attempts, selective packet drops, network database (floodfill) disruption, and others.
    6175        <li> 3 Tiers:
    6276        <ul>
    126140        <li> Speed - simple - the average amount of data per tunnel we have sent through
    127141          the peer in the last minute
     143The speed calculation (as implemented here) simply goes through the profile and estimates how much data we can send or receive on a single tunnel through the peer in a minute. For this estimate it just looks at past performance.
     145Previous algorithms used a longer time, weighing more recent data more heavily, and extrapolating it for the future. Another previous calculation used total (rather than per-tunnel) bandwidth, but it was decided that this method overrated slow, high-capacity peers (that is, slow in bandwidth but high-capacity in number of tunnels). Even earlier methods were much more complex.
     147The current speed calculation has remained unchanges since early 2006, and it is a topic for further study to verify its effectiveness on today's network.
    128149        <li> Capacity - not so simple - The weighted number of successful tunnel builds
    129150          through the peer.
    130151          Let r(t) be the successful builds per hour, over a certain time period;
    131152          R = 4*r(10m) + 3*r(30m) + 2*r(1h) + r(1d);
     154The capacity calculation (as implemented here) simply goes through the profile and estimates how many tunnels we think the peer would agree to participate in over the next hour. For this estimate it looks at how many the peer has agreed to lately, how many the peer rejected, and how many of the agreed to tunnels failed, and extrapolates the data. In addition, it includes a 'growth factor' (when the peer isn't failing or rejecting requests) so that we will keep trying to push their limits.
     156The tunnel-failure part of the calculation was disabled for a long period, but it was reinstated for release 0.6.2, as it became apparent that recognizing and avoiding unreliable and unreachable peers is critically important. Unfortunately, as the tunnel tests require the participation of several peers, it is difficult to positively identify the cause of a failure. The recent changes assign a probability of failure to each of the peers, and uses that probability in the capacity calculation. These recent changes may require additional adjustment, and are a topic for further study to verify their effectiveness on today's network.
    132158        <li> Growth factor adds a little bit each time, so that new peers are tried
    133159        <li> Not required for measurement: any special-purpose 'load'
    172 <h2>How It Works, Lessons for Other Implementations   </h2><ul>
    174         <li> Great! Core system has been in place since the beginning of I2P.
     198<h2>How It Works, Lessons for Other Implementations   </h2>
     200        <p> Great! Core system has been in place since the beginning of I2P.
    175201             Pretty much includes everything proposed in "Tuneup for Tor"
    176         <li> Stability - Tier members don't change that often; as using a peer for tunnels
    177           tends to increase that peer's metrics, which keeps it in the pool.
    178           This is a desirable quality for anonymity.
    179         <li> Most client tunnels go through the highest-performance peers when the network is uncrowded;
    180           traffic "spreads out" to lower-performing peers when the network gets busy
    181         <li> The parameters, weighting, and calculations are VERY powerful "knobs" to turn;
     203        <p> Stability - The Fast and High Capacity tier members don't change rapidly. This is because when a router uses  a peer for tunnels, it tends to increase that peer's speed and capacity metrics. This keeps it in the pool.
     204          This is a desirable quality for anonymity, as using a large pool for tunnel building tends to assist predecessor attacks (ref).
     206        <p> The tier system is fast-reacting to individual peer failure or overload, and to increased local demand for bandwidth. The speed and capacity metrics are highly weighted for recent performance. When a peer starts to drop test messages, or fails to respond to tunnel build requests, it will quickly be demoted out of the high-capacity pool. As bandwidth demand increases, the speed metric for individual peers will rapidly increase, and the fast tier will quickly become reorganized to include the newly-recognized fast peers.
     211<p> The tier system tends to use the highest-bandwidth peers when the network is not congested. As congestion increases, the total network  traffic "spreads out" to lower-capacity peers. From an overall network perspective, this is optimal as it maintains a similar level of service for all routers.
     213<p> The profiling system does not over-optimize. The router uses its own, normally-generated traffic for peer profiling. No high-bandwidth test messages are required or used. When a router does not require high bandwidth or a high number of tunnels, the metrics for each peer are correspondingly lower. Therefore, even a low-bandwidth peer may be classified as "fast" and used for tunnels. This tends to, at least partially, spread low-bandwidth tunnels broadly across the network, and leaves the high-capacity peers for high-speed traffic. However, even very-low-bandwidth routers tend to accurately find a few fast peers and thus are  well-prepared when higher bw is demanded.
     216<p> The profiling system is persistent across restarts, and maintains measurements for the last 24 hours. This allows a recently-started router to quickly re-integrate to the network, whether the router was just restarted or has been down for some time.
     218        <p> The parameters, weighting, and calculations are VERY powerful "knobs" to turn.
    182219          difficult to test and debug in a distributed network;
    183220          impossible to fully optimize;
    184221          can take months or years to
    185           get right
    186         <li> The basic measurements are much simpler than they used to be
    187         <li> The punishment is what keeps the network running well. How heavily you punish
     222          get right. The I2P router contains a framework for local network simulation and testing; however, we have not used this framework for profiling and selection testing. As described above, the routers include bandwidth and tunnel build success statistics in the network database entry they publish to the floodfill routers. While this information is not trusted or used in the router, it is gathered by the stats.i2p website (ref). On that website, several network performance graphs are presented, and the I2P developers make heavy use of this facility to monitor the network and judge the effectiveness of software changes in each release.
     225<p> Here are some other lessens for those wishing to "tune up" their own network.
     227        <p> The basic measurements are much simpler than they used to be. The speed calculation, for example, was at one time over 400 lines of code, and it is now essentially a one-liner. The punishment for bad behavior is what keeps the network running well, and also is an area for further research. How heavily you punish
    188228          determines how fast the load spreads out across the network as the it gets busy,
    189           and how quickly an overloaded peer gets avoided.
    190         <li> Start simple, get more complex if necessary, you will probably have to simplify again later
     229          and how quickly an overloaded peer gets avoided. Too The implementation contains an implicit estimate of the cost of a tunnel build request, as it rates the relative performance of a rejected requests and a dropped request. Or, alternatively, an accepted request vs. a request not made at all. Another way of looking at it is to establish a baseline of a peer that we have not made any requests of. Call that a capacity rating of zero. What percentage (or absolute number) of request rejections, dropped requests, and test failures is required to drop the capacity rating below zero?
     231If peers are punished too heavily, the network will tend to congestion collapse as most peers are driven to negative capacity ratings and tunnel load spreads out quickly throughout the network, as routers attempt to route tunnels through very-low-capacity peers.
     233If peers are punished too lightly, routers will be slow to react to overloaded peers, and maintain the same high capacity peers for too long. In other words, a router will accept poor performance from a peer even when better peers may be available.
     235        <p> Start simple, get more complex if necessary, you will probably have to simplify again later
    191236          (for example, our speed calc used to be pages and pages of code, now a one-liner)
    192         <li> You don't need to generate large amounts of special-purpose data for testing. Normal traffic will do.
     238        <p> You don't need to generate large amounts of special-purpose data for testing. Normal traffic will do.
    193239          Our exploratory tunnels average only 100KB in 10m (~150Bps)...
    194240          Our unused client tunnels average only 10KB in 10m from tunnel tests (~15Bps)...
    202248<h2>References                   </h2><ul>
     249<li>This paper includes material adapted from written by jrandom and others.
    203250<li><i>A Tune-up for Tor: Improving Security and Performance in the Tor Network</i>
    204251Robin Snader and Nikita Borisov @ UIUC