Opened 7 years ago

Last modified 4 years ago

#699 open enhancement

Various small tweaks to reduce object churn

Reported by: Zlatin Balevsky Owned by: zzz
Priority: maintenance Milestone:
Component: router/transport Version: 0.9.1
Keywords: gc memory Cc: zab@…
Parent Tickets: Sensitive: no

Description

Ticket for all kinds of small tweaks that will (hopefully) add up to a significant reduction in object churn and garbage collections. Will be posting them in separate comments as I find them

Related forum post: http://zzz.i2p/topics/1218

Subtickets

Change History (20)

comment:1 Changed 7 years ago by Zlatin Balevsky

5% of the garbage recorded in dev router is in KBucketImpl.getEntries(SelectionCollector?)
I'm not sure why creating a new set is necessary at all, but if it is a threadlocal can avoid it:

#
# old_revision [53d1267566f20d55468e72da7510c86501205503]
#
# patch "router/java/src/net/i2p/router/networkdb/kademlia/KBucketImpl.java"
#  from [e69eead5451f769edf8daf6358638a2bed6e0bc6]
#    to [882e6496de3c82b9c2f424becc24afd219b88994]
# 
============================================================
--- router/java/src/net/i2p/router/networkdb/kademlia/KBucketImpl.java	e69eead5451f769edf8daf6358638a2bed6e0bc6
+++ router/java/src/net/i2p/router/networkdb/kademlia/KBucketImpl.java	882e6496de3c82b9c2f424becc24afd219b88994
@@ -45,6 +45,15 @@ class KBucketImpl implements KBucket {
     private long _lastShuffle;
     private I2PAppContext _context;
     
+    /** use this instead of creating new one every time */
+    private final ThreadLocal<Set<Hash>> _cachedHash = 
+    		new ThreadLocal<Set<Hash>>() {
+				@Override
+				protected Set<Hash> initialValue() {
+					return new HashSet<Hash>();
+				}
+    		};
+    
     public KBucketImpl(I2PAppContext context, LocalHash local) {
         _context = context;
         _log = context.logManager().getLog(KBucketImpl.class);
@@ -219,10 +228,12 @@ class KBucketImpl implements KBucket {
     }
     
     public void getEntries(SelectionCollector collector) {
-        Set<Hash> entries = new HashSet(_entries);
+    	Set<Hash> entries = _cachedHash.get();
+    	entries.addAll(_entries);
         for (Hash h : entries) {
                 collector.add(h);
         }
+        entries.clear();
     }
     
     public void setEntries(Set<Hash> entries) {

comment:2 Changed 7 years ago by Zlatin Balevsky

3% of the garbage is created when constructing the name of the "sendProcessingTime" stat every time a message is sent successfully.

A simple way to avoid is to pre-create these strings: http://pastethis.i2p/show/1702/

comment:3 Changed 7 years ago by Zlatin Balevsky

another 4% are from NTCPConnection.acquireBuf() not much to say here, 6 buffers are just too small for such busy router.

Another 3% are in NTCPConnection.bufferPrepare() this line:

buf.encrypted = new byte[buf.unencryptedLength];

but that probably requires more complicated handling

comment:4 Changed 7 years ago by Zlatin Balevsky

at least another 2% (but could be more) are from TransportImpl?.getMaxConnections() which creates strings on the fly:

style = style.toLstyle = style.toLowerCase(Locale.US);owerCase(Locale.US);
...
return _context.getProperty("i2np." + style + ".maxConnections", def);

Both can be easily cached like in comment 2

comment:5 Changed 7 years ago by Zlatin Balevsky

5% of the garbage is created because UDPTransport.bid (ab)uses the constructor of UDPAddress just to decide if it should mark the Hash unreachable.

comment:6 Changed 7 years ago by Zlatin Balevsky

Another 1% is in net.i2p.data.RouterInfo::getBytes() - I'm almost 100% sure that the ByteArrayOutputStream? and the underlying byte[] can be cached and reused

comment:7 Changed 7 years ago by Zlatin Balevsky

These findings can be reviewed on the allocations snapshot here:

http://nnz2serwcyvndjwqufqpubiu2khqjqelrwrim4pxiyhk7ixwnhnq.b32.i2p/0.9.1-15-LHS-2r2w/allocations.snapshot.bz2

  1. Open with yourkit (ask echelon or myself for license)
  2. "Memory" tab on top
  3. "Allocations" sub-tab in the middle
  4. Sort the "Size" column in descending order
  5. Drill down and enjoy

comment:8 Changed 7 years ago by zzz

0.9.1-18 contains fixes for the issues in comments 1, 2, 3 (first one), 4, and 5 above.

For the second one in comment 3 (buf.encrypted), I added comments in NTCPConnection about one possible way to fix it, by adding another cache.

For comment 6 (RouterInfo?), it is cached if you have more than 128 MB of memory. Maybe there's some bug or maybe just a lot of writes.

comment:9 Changed 7 years ago by Zlatin Balevsky

Updates after 30 minutes of profiling of 0.9.1-18

yourkit snapshot: http://nnz2serwcyvndjwqufqpubiu2khqjqelrwrim4pxiyhk7ixwnhnq.b32.i2p/0.9.1-18/allocations.snapshot.bz2
expanded traces of the allocations: http://nnz2serwcyvndjwqufqpubiu2khqjqelrwrim4pxiyhk7ixwnhnq.b32.i2p/0.9.1-18/yourkit-allocations/allocations.html

Not many low-hanging fruit left:

  • ~5% can be saved by removing iterators from PeerState? http://pastethis.i2p/show/1722/
  • Most of the rest is in NTCPConnection.acquireBuf() so tweaking the size of the cache should help. Perhaps that should be a config option?

comment:10 Changed 7 years ago by zzz

PeerState? is pretty bad but I think we can do better than an improved ArrayList?. Some of those iterators are just looking for a single match.

It's used both for unsent msgs, where we want an order, and for msgs sent and awaiting an ack and maybe a retransmission - where I guess we want an order also for retransmission.

But for looking up to see if an outstanding packet was acked, a HashSet? would be better. So maybe a LinkedHashSet?. SSU sends a billion duplicate acks so looking up acks should be fast.

Or maybe split it into two, a Queue for unsent and a LHS for sent…

comment:11 Changed 7 years ago by zzz

… and iterating through messages that haven't been sent yet to see if they've been acked is just silly….

comment:12 Changed 7 years ago by zzz

… and the iterators could be short-circuited with isEmpty() first…

comment:13 Changed 7 years ago by Zlatin Balevsky

The specific iterator hotspots in PeerState? are:

  • allocateSend() - 2%
  • finishMessages() - 2%
  • getNextDelay() - 1%

I haven't looked into the logic in detail but you may need an iterator there even if using Set, and those are harder to cache. Your call.

comment:14 Changed 7 years ago by Zlatin Balevsky

This concludes the harvest of low-hanging fruit on my part. Take a look at the snapshot, something may jump out as you know the code better (I'll email you the license). I'll be focusing on #697 for the next few days unless there are interesting developments on this front.

comment:15 Changed 7 years ago by Zlatin Balevsky

P.S. the memory usage of dev1 router does look better. In the past with similar or even less traffic I was seeing heap 490MB, 'top' around 600MB. Now heap is less than 300 and 'top' is less than 500MB. Because of the way memory is managed there's never going to be an exact match, but these improvements do make a difference.

(To really quantify the improvements you need to look at frequency and length of garbage collections, then compute how they will affect TCP traffic, then compute how reading from the backlogged buffers may or may not burst… too many factors)

comment:16 Changed 7 years ago by Zlatin Balevsky

Cc: zab@… added

comment:17 Changed 7 years ago by zzz

Split queue in PeerState? (see comment 10 above) implemented in 0.9.2-1. Comments added to the other places in PeerState? that need attention.

comment:18 Changed 7 years ago by zzz

Component: router/generalrouter/transport
Owner: set to zzz

only thing left is PeerState?.

comment:19 Changed 6 years ago by zzz

Priority: minormaintenance

comment:20 Changed 4 years ago by str4d

Status: newopen
Note: See TracTickets for help on using tickets.