Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#765 closed enhancement (fixed)

Small optimization of peer.profileCoalesceTime locking

Reported by: Zlatin Balevsky Owned by: zzz
Priority: minor Milestone: 0.9.4
Component: router/general Version: 0.9.3
Keywords: Cc: Zlatin Balevsky
Parent Tickets: Sensitive: no

Description

Vanilla 0.9.3-4, busy router:

EventsLifetime average
28753.32
29915.76
301219.47
311540.10
322251.56
332791.88
..…….
383386.34

Peers: ~2700 / ~7000
Tunnels: 4k

Eventually the periods under writeLock get so long (20s+) the router drops half of the tunnels.

Subtickets

Change History (15)

comment:1 Changed 7 years ago by Zlatin Balevsky

One stab at this http://pastethis.i2p/show/2158/

Split the coalesce() method in two; coalesce the stats under the read lock and only update the values that affect the sorting under the write lock.

comment:2 in reply to:  1 Changed 7 years ago by Zlatin Balevsky

Replying to zab:
oops just saw that I'm double-counting the expiredCount but that's only used in logging anyway

comment:3 Changed 7 years ago by Zlatin Balevsky

More aggressive version http://pastethis.i2p/show/2159/

Coalesce and calculate the values under read lock but store them in separate fields. Only copy those fields under the write lock.

The state management may not be necessary and may in fact be buggy when it interacts with the expired logic. Will see.

comment:4 in reply to:  3 Changed 7 years ago by Zlatin Balevsky

Replying to zab:

The state management may not be necessary and may in fact be buggy when it interacts with the expired logic. Will see.

Yes, it will need to be relaxed. A new PeerProfile may be added between the release of the readLock and the acquisition of the writeLock. Here's a more relaxed version: http://pastethis.i2p/show/2161/

comment:5 Changed 7 years ago by Zlatin Balevsky

Priority: majorminor
Summary: Scary peer.profileCoalesceTime readingsSmall optimization of peer.profileCoalesceTime locking
Type: defectenhancement

Ha-ha. Turns out somebody cut the memory of my testing VM in half and the router was swapping heavily. Anyway, patch 2161 is still a nice optimization:

EventsLifetime average
271406.89
281793.86
292478.21
303297.30
313910.77
..….
336675.03

BUT peer.profileReorgTime Lifetime average value: 110.58 (33 events)

comment:6 Changed 7 years ago by zzz

Not addressing the issue at all, but added a change to PeerManager? in 0.9.3-5 to not reorganize as often if it takes too long.

comment:7 Changed 7 years ago by zzz

Not clear on your data. What's the coalesce time and reorg time after the patch and memory fix? The avg. coalesce time is 6675 and growing still? Doesnt make sense.

comment:8 Changed 7 years ago by Zlatin Balevsky

Sorry, the data in the table above is from before the memory was increased. With enough RAM, the total reorg time without the patch is in 200-300ms range. With the patch it drops to a little less than 100ms.

comment:9 Changed 7 years ago by zzz

Milestone: 0.9.4
Owner: set to zzz
Status: newaccepted

ok that's a big savings if we can really move that stuff outside the write lock. I'll review the patch.

comment:10 Changed 7 years ago by zzz

Ok I reviewed http://pastethis.i2p/show/2161/ and I'm wondering if we could make it a lot simpler. I think we can do the entire coalesce and update under the read lock, and therefore we don't need any changes to PeerProfile?.java.

I don't think there's a problem with a profile being in the "wrong" group (fast, high cap) during the window between coalescing and acquisition of the write lock. Stats are just stats, and in the large interval between coalesces, the "actual" stat drifts away from the last coalesced stat anyway. The profiling is just a sorting of peers into categories based on sloppy, recent stats and heuristics.

I'm going to look at it further. What do you think about this simplification?

comment:11 Changed 7 years ago by Zlatin Balevsky

update() needs to be under write lock because it modifies fields that are used in the comparator of _strictCapacityOrder and runs the risk of breaking the requirement of https://en.wikipedia.org/wiki/Total_order for red and black trees. You'll get IllegalStateExceptions, NoSuchElementExceptions, all kinds of weird behavior.

comment:12 Changed 7 years ago by zzz

Makes sense. _strictCapacityOrder is sorted by InverseCapacityComparator? which uses the capacity value and speed value. Integration value isn't used in the treeset, and isFailing() is always false.

So updateValues() only needs to do speed and capacity.

comment:13 Changed 7 years ago by zzz

I'd like to get this in for 0.9.4. Do you want to check it in or would you like me to?

comment:14 Changed 7 years ago by Zlatin Balevsky

Resolution: fixed
Status: acceptedclosed

Should be in trunk with revision 7eb8a06162fc6d67c420fd73ce05197725852a1e (pushed to KYTV's server)

comment:15 Changed 7 years ago by zzz

Great. I'm going to make a small change in PeerProfile?.coalesceOnly() and updateValues() as implied above in my comment 12, to set the integration and isFailing directly in coalesceOnly() instead of putting in a temp variable.

Note: See TracTickets for help on using tickets.