Opened 8 months ago

Closed 6 months ago

#2694 closed enhancement (fixed)

FloodfillPeerSelector: Remove bottlenecks

Reported by: jogger Owned by: zzz
Priority: minor Milestone: 0.9.46
Component: router/netdb Version: 0.9.44
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

The addall() to the sorted treeset costs lots of CPU due to poor implementation of mass additions on the Java level. Collections.sort() is much faster.

the _sorted treeset is accessed much too often at the end of get().

Patch:

--- i2p-0.9.44/router/java/src/net/i2p/router/networkdb/kademlia/FloodfillPeerSelector.java
+++ 44dev/router/java/src/net/i2p/router/networkdb/kademlia/FloodfillPeerSelector.java
@@ -13,6 +13,7 @@
 import java.util.Collections;
 import java.util.HashSet;
 import java.util.Iterator;
+import java.util.LinkedList;
 import java.util.List;
 import java.util.Set;
 import java.util.TreeSet;
@@ -203,13 +204,12 @@
      *  @param kbuckets now unused
      */
     private List<Hash> selectFloodfillParticipantsIncludingUs(Hash key, int howMany, Set<Hash> toIgnore, KBucketSet<Hash> kbuckets) {
-        List<Hash> ffs = selectFloodfillParticipants(toIgnore, kbuckets);
-        TreeSet<Hash> sorted = new TreeSet<Hash>(new XORComparator<Hash>(key));
-        sorted.addAll(ffs);
+        LinkedList<Hash> sorted = new LinkedList<Hash>(selectFloodfillParticipants(toIgnore, kbuckets));
+        Collections.sort(sorted, new XORComparator<Hash>(key));
 
         List<Hash> rv = new ArrayList<Hash>(howMany);
-        List<Hash> okff = new ArrayList<Hash>(ffs.size());
-        List<Hash> badff = new ArrayList<Hash>(ffs.size());
+        List<Hash> okff = new ArrayList<Hash>(sorted.size());
+        List<Hash> badff = new ArrayList<Hash>(sorted.size());
         int found = 0;
         long now = _context.clock().now();
         long installed = _context.getProperty("router.firstInstalled", 0L);
@@ -229,14 +229,13 @@
 
         // 5 == FNDF.MAX_TO_FLOOD + 1
         int limit = Math.max(5, howMany);
-        limit = Math.min(limit, ffs.size());
+        limit = Math.min(limit, sorted.size());
         MaskedIPSet maskedIPs = new MaskedIPSet(limit * 3);
         // split sorted list into 3 sorted lists
         for (int i = 0; found < howMany && i < limit; i++) {
-            Hash entry = sorted.first();
+            Hash entry = sorted.poll();
             if (entry == null)
                 break;  // shouldn't happen
-            sorted.remove(entry);
             // put anybody in the same /16 at the end
             RouterInfo info = _context.netDb().lookupRouterInfoLocally(entry);
             MaskedIPSet entryIPs = new MaskedIPSet(_context, entry, info, 2);
@@ -442,11 +441,10 @@
             }
             // are we corrupting _sorted here?
             for (int i = rv.size(); i < howMany; i++) {
-                if (_sorted.isEmpty())
+                Hash entry = _sorted.pollFirst();
+                if (entry == null)
                     break;
-                Hash entry = _sorted.first();
                 rv.add(entry);
-                _sorted.remove(entry);
             }
             return rv;
         }

Subtickets

Change History (11)

comment:1 Changed 8 months ago by zzz

Milestone: undecided0.9.46
Status: newaccepted

I'll take a look for 0.9.46.
You have any reference link or measurement that this would make a difference for order of 1700 entries?

comment:2 Changed 8 months ago by Zlatin Balevsky

Without having benchmarked.. insertion into a TreeSet is O(logN) but there is a new node created for each element which could be causing the issue. If that's the case it's better to use an ArrayList, also it might sort even faster than a LinkedList.

The inefficiency of removing from the head of an ArrayList can be mitigated because it is known in advance howMany are needed, so a single System.arraycopy will be sufficient.

comment:3 Changed 8 months ago by zzz

Java uses TimSort? which is O(n log(n)) https://en.wikipedia.org/wiki/Timsort
If java addAll() inserts one at a time, then in total it's going to be O(nsquared log(n)), right?

Last edited 8 months ago by zzz (previous) (diff)

comment:4 Changed 8 months ago by Zlatin Balevsky

No, addAll is still O(n*log(n)) or at least should be (you're adding N elements, log(N) each). The devil must be in the details somewhere.

comment:5 Changed 8 months ago by jogger

Right, sort() has the same complexity of O(n * log(n)). The difference is the overhead associated with each step. Adding to the tree means reorganizing the tree each time and that moves much more data around.

Ideally a mass insertion is done by sorting the added data first and doing a merge afterwards which is just what my hint does as the target is empty.

Collections.sort() sorts linked lists as arrays so there is no extra overhead for using a linked list.

comment:6 Changed 8 months ago by Zlatin Balevsky

Collections.sort() sorts linked lists as arrays so there is no extra overhead for using a linked list.

List::sort creates a new array and copies all elements to it, and that will involve walking the entire LinkedList, whereas an ArraysList sorts the underlying array in-place.

comment:7 Changed 8 months ago by zzz

OK, so back to my question in comment 1 - have any empirical evidence of the savings with n ~= 1700, or a link to any post by anybody who has done similar analysis?

comment:8 Changed 8 months ago by zzz

Maybe 10% savings? Tweak test code as you like

Size: 1700 runs: 50000 list time: 9873 treeset time: 10938

        int SZ = 1700;
        List<Integer> list = new ArrayList<Integer>(SZ);
        for (int i = 0; i < SZ; i++) {
             list.add(Integer.valueOf(i));
        }
        long ltime = 0;
        long stime = 0;
        int RUNS = 50000;
        for (int i = 0; i < RUNS; i++) {
            Collections.shuffle(list);
            List<Integer> l2 = new ArrayList<Integer>(SZ);
            long start1 = System.currentTimeMillis();
            l2.addAll(list);
            Collections.sort(l2);
            ltime += System.currentTimeMillis() - start1;
            long start2 = System.currentTimeMillis();
            Set<Integer> set = new TreeSet<Integer>();
            set.addAll(list);
            stime += System.currentTimeMillis() - start2;
        }
        System.out.println("Size: " + SZ + " runs: " + RUNS + " list time: " + ltime + " treeset time: " + stime);

comment:9 Changed 8 months ago by jogger

This is getting tricky. Converted int to string and used a custom comparator. List came out first on JDK11, Graal 20 and JDK 13 were better on the treeset. The flaw is the shuffle. Timsort expects real-world data, not randomized input.

So the comparison is unfair, the treeset always performs the same regardless of input, the sort takes longer due to randomizing.

comment:10 Changed 8 months ago by zzz

That's why I posted it, adjust it to your liking and do your own tests.
FYI my test was on Java 8.

comment:11 Changed 6 months ago by zzz

Resolution: fixed
Status: acceptedclosed

Still not sure I understand comment 9, real-world data _is_ random.

However, I agree with OP that one-time sort of List is better than TreeSet?.addAll(). ArrayList? even better than LinkedList?.

Took first part of patch, tweaked to avoid additional copy by sorting the first list.
After that, no longer removing items from front of sorted.
Reduced size of okff and badff to max required.

Did not take the latter part of OP's patch to the Collector, that code is already known to be slow and I think rarely used now, to be investigated further, perhaps we can get rid of it completely by converting the remaining users of the slow (kbucket) methods to the faster (selectFloodfill) ones.

In 03c112ebc403b94b94031be3afbfb68a417c3a91 to be 0.9.45-18

Note: See TracTickets for help on using tickets.