Opened 6 weeks ago

Last modified 5 weeks ago

#2694 accepted enhancement

FloodfillPeerSelector: Remove bottlenecks

Reported by: jogger Owned by: zzz
Priority: minor Milestone: 0.9.46
Component: router/netdb Version: 0.9.44
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

The addall() to the sorted treeset costs lots of CPU due to poor implementation of mass additions on the Java level. Collections.sort() is much faster.

the _sorted treeset is accessed much too often at the end of get().

Patch:

--- i2p-0.9.44/router/java/src/net/i2p/router/networkdb/kademlia/FloodfillPeerSelector.java
+++ 44dev/router/java/src/net/i2p/router/networkdb/kademlia/FloodfillPeerSelector.java
@@ -13,6 +13,7 @@
 import java.util.Collections;
 import java.util.HashSet;
 import java.util.Iterator;
+import java.util.LinkedList;
 import java.util.List;
 import java.util.Set;
 import java.util.TreeSet;
@@ -203,13 +204,12 @@
      *  @param kbuckets now unused
      */
     private List<Hash> selectFloodfillParticipantsIncludingUs(Hash key, int howMany, Set<Hash> toIgnore, KBucketSet<Hash> kbuckets) {
-        List<Hash> ffs = selectFloodfillParticipants(toIgnore, kbuckets);
-        TreeSet<Hash> sorted = new TreeSet<Hash>(new XORComparator<Hash>(key));
-        sorted.addAll(ffs);
+        LinkedList<Hash> sorted = new LinkedList<Hash>(selectFloodfillParticipants(toIgnore, kbuckets));
+        Collections.sort(sorted, new XORComparator<Hash>(key));
 
         List<Hash> rv = new ArrayList<Hash>(howMany);
-        List<Hash> okff = new ArrayList<Hash>(ffs.size());
-        List<Hash> badff = new ArrayList<Hash>(ffs.size());
+        List<Hash> okff = new ArrayList<Hash>(sorted.size());
+        List<Hash> badff = new ArrayList<Hash>(sorted.size());
         int found = 0;
         long now = _context.clock().now();
         long installed = _context.getProperty("router.firstInstalled", 0L);
@@ -229,14 +229,13 @@
 
         // 5 == FNDF.MAX_TO_FLOOD + 1
         int limit = Math.max(5, howMany);
-        limit = Math.min(limit, ffs.size());
+        limit = Math.min(limit, sorted.size());
         MaskedIPSet maskedIPs = new MaskedIPSet(limit * 3);
         // split sorted list into 3 sorted lists
         for (int i = 0; found < howMany && i < limit; i++) {
-            Hash entry = sorted.first();
+            Hash entry = sorted.poll();
             if (entry == null)
                 break;  // shouldn't happen
-            sorted.remove(entry);
             // put anybody in the same /16 at the end
             RouterInfo info = _context.netDb().lookupRouterInfoLocally(entry);
             MaskedIPSet entryIPs = new MaskedIPSet(_context, entry, info, 2);
@@ -442,11 +441,10 @@
             }
             // are we corrupting _sorted here?
             for (int i = rv.size(); i < howMany; i++) {
-                if (_sorted.isEmpty())
+                Hash entry = _sorted.pollFirst();
+                if (entry == null)
                     break;
-                Hash entry = _sorted.first();
                 rv.add(entry);
-                _sorted.remove(entry);
             }
             return rv;
         }

Subtickets

Change History (10)

comment:1 Changed 6 weeks ago by zzz

Milestone: undecided0.9.46
Status: newaccepted

I'll take a look for 0.9.46.
You have any reference link or measurement that this would make a difference for order of 1700 entries?

comment:2 Changed 6 weeks ago by Zlatin Balevsky

Without having benchmarked.. insertion into a TreeSet is O(logN) but there is a new node created for each element which could be causing the issue. If that's the case it's better to use an ArrayList, also it might sort even faster than a LinkedList.

The inefficiency of removing from the head of an ArrayList can be mitigated because it is known in advance howMany are needed, so a single System.arraycopy will be sufficient.

comment:3 Changed 6 weeks ago by zzz

Java uses TimSort? which is O(n log(n)) https://en.wikipedia.org/wiki/Timsort
If java addAll() inserts one at a time, then in total it's going to be O(nsquared log(n)), right?

Last edited 6 weeks ago by zzz (previous) (diff)

comment:4 Changed 6 weeks ago by Zlatin Balevsky

No, addAll is still O(n*log(n)) or at least should be (you're adding N elements, log(N) each). The devil must be in the details somewhere.

comment:5 Changed 6 weeks ago by jogger

Right, sort() has the same complexity of O(n * log(n)). The difference is the overhead associated with each step. Adding to the tree means reorganizing the tree each time and that moves much more data around.

Ideally a mass insertion is done by sorting the added data first and doing a merge afterwards which is just what my hint does as the target is empty.

Collections.sort() sorts linked lists as arrays so there is no extra overhead for using a linked list.

comment:6 Changed 6 weeks ago by Zlatin Balevsky

Collections.sort() sorts linked lists as arrays so there is no extra overhead for using a linked list.

List::sort creates a new array and copies all elements to it, and that will involve walking the entire LinkedList, whereas an ArraysList sorts the underlying array in-place.

comment:7 Changed 6 weeks ago by zzz

OK, so back to my question in comment 1 - have any empirical evidence of the savings with n ~= 1700, or a link to any post by anybody who has done similar analysis?

comment:8 Changed 5 weeks ago by zzz

Maybe 10% savings? Tweak test code as you like

Size: 1700 runs: 50000 list time: 9873 treeset time: 10938

        int SZ = 1700;
        List<Integer> list = new ArrayList<Integer>(SZ);
        for (int i = 0; i < SZ; i++) {
             list.add(Integer.valueOf(i));
        }
        long ltime = 0;
        long stime = 0;
        int RUNS = 50000;
        for (int i = 0; i < RUNS; i++) {
            Collections.shuffle(list);
            List<Integer> l2 = new ArrayList<Integer>(SZ);
            long start1 = System.currentTimeMillis();
            l2.addAll(list);
            Collections.sort(l2);
            ltime += System.currentTimeMillis() - start1;
            long start2 = System.currentTimeMillis();
            Set<Integer> set = new TreeSet<Integer>();
            set.addAll(list);
            stime += System.currentTimeMillis() - start2;
        }
        System.out.println("Size: " + SZ + " runs: " + RUNS + " list time: " + ltime + " treeset time: " + stime);

comment:9 Changed 5 weeks ago by jogger

This is getting tricky. Converted int to string and used a custom comparator. List came out first on JDK11, Graal 20 and JDK 13 were better on the treeset. The flaw is the shuffle. Timsort expects real-world data, not randomized input.

So the comparison is unfair, the treeset always performs the same regardless of input, the sort takes longer due to randomizing.

comment:10 Changed 5 weeks ago by zzz

That's why I posted it, adjust it to your liking and do your own tests.
FYI my test was on Java 8.

Note: See TracTickets for help on using tickets.