Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#765 closed enhancement (fixed)

Small optimization of peer.profileCoalesceTime locking

Reported by: zab Owned by: zzz
Priority: minor Milestone: 0.9.4
Component: router/general Version: 0.9.3
Keywords: Cc: zab@…
Parent Tickets:

Description

Vanilla 0.9.3-4, busy router:

EventsLifetime average
28753.32
29915.76
301219.47
311540.10
322251.56
332791.88
.........
383386.34

Peers: ~2700 / ~7000
Tunnels: 4k

Eventually the periods under writeLock get so long (20s+) the router drops half of the tunnels.

Subtickets

Change History (15)

comment:1 follow-up: Changed 6 years ago by zab

One stab at this http://pastethis.i2p/show/2158/

Split the coalesce() method in two; coalesce the stats under the read lock and only update the values that affect the sorting under the write lock.

comment:2 in reply to: ↑ 1 Changed 6 years ago by zab

Replying to zab:
oops just saw that I'm double-counting the expiredCount but that's only used in logging anyway

comment:3 follow-up: Changed 6 years ago by zab

More aggressive version http://pastethis.i2p/show/2159/

Coalesce and calculate the values under read lock but store them in separate fields. Only copy those fields under the write lock.

The state management may not be necessary and may in fact be buggy when it interacts with the expired logic. Will see.

comment:4 in reply to: ↑ 3 Changed 6 years ago by zab

Replying to zab:

The state management may not be necessary and may in fact be buggy when it interacts with the expired logic. Will see.

Yes, it will need to be relaxed. A new PeerProfile may be added between the release of the readLock and the acquisition of the writeLock. Here's a more relaxed version: http://pastethis.i2p/show/2161/

comment:5 Changed 6 years ago by zab

  • Priority changed from major to minor
  • Summary changed from Scary peer.profileCoalesceTime readings to Small optimization of peer.profileCoalesceTime locking
  • Type changed from defect to enhancement

Ha-ha. Turns out somebody cut the memory of my testing VM in half and the router was swapping heavily. Anyway, patch 2161 is still a nice optimization:

EventsLifetime average
271406.89
281793.86
292478.21
303297.30
313910.77
......
336675.03

BUT peer.profileReorgTime Lifetime average value: 110.58 (33 events)

comment:6 Changed 6 years ago by zzz

Not addressing the issue at all, but added a change to PeerManager? in 0.9.3-5 to not reorganize as often if it takes too long.

comment:7 Changed 6 years ago by zzz

Not clear on your data. What's the coalesce time and reorg time after the patch and memory fix? The avg. coalesce time is 6675 and growing still? Doesnt make sense.

comment:8 Changed 6 years ago by zab

Sorry, the data in the table above is from before the memory was increased. With enough RAM, the total reorg time without the patch is in 200-300ms range. With the patch it drops to a little less than 100ms.

comment:9 Changed 6 years ago by zzz

  • Milestone set to 0.9.4
  • Owner set to zzz
  • Status changed from new to accepted

ok that's a big savings if we can really move that stuff outside the write lock. I'll review the patch.

comment:10 Changed 6 years ago by zzz

Ok I reviewed http://pastethis.i2p/show/2161/ and I'm wondering if we could make it a lot simpler. I think we can do the entire coalesce and update under the read lock, and therefore we don't need any changes to PeerProfile?.java.

I don't think there's a problem with a profile being in the "wrong" group (fast, high cap) during the window between coalescing and acquisition of the write lock. Stats are just stats, and in the large interval between coalesces, the "actual" stat drifts away from the last coalesced stat anyway. The profiling is just a sorting of peers into categories based on sloppy, recent stats and heuristics.

I'm going to look at it further. What do you think about this simplification?

comment:11 Changed 6 years ago by zab

update() needs to be under write lock because it modifies fields that are used in the comparator of _strictCapacityOrder and runs the risk of breaking the requirement of https://en.wikipedia.org/wiki/Total_order for red and black trees. You'll get IllegalStateExceptions, NoSuchElementExceptions, all kinds of weird behavior.

comment:12 Changed 6 years ago by zzz

Makes sense. _strictCapacityOrder is sorted by InverseCapacityComparator? which uses the capacity value and speed value. Integration value isn't used in the treeset, and isFailing() is always false.

So updateValues() only needs to do speed and capacity.

comment:13 Changed 6 years ago by zzz

I'd like to get this in for 0.9.4. Do you want to check it in or would you like me to?

comment:14 Changed 6 years ago by zab

  • Resolution set to fixed
  • Status changed from accepted to closed

Should be in trunk with revision 7eb8a06162fc6d67c420fd73ce05197725852a1e (pushed to KYTV's server)

comment:15 Changed 6 years ago by zzz

Great. I'm going to make a small change in PeerProfile?.coalesceOnly() and updateValues() as implied above in my comment 12, to set the integration and isFailing directly in coalesceOnly() instead of putting in a temp variable.

Note: See TracTickets for help on using tickets.