Opened 5 weeks ago

Last modified 5 weeks ago

#2640 new defect

Fix UDP dups

Reported by: jogger Owned by: zzz
Priority: major Milestone: 0.9.44
Component: router/transport Version: 0.9.42
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

The UDP transport contains a whole bunch of timing quirks and logic flaws I have patches for. Here I am presenting only those that call for an immediate fix as they seriously impact router operation.

UDP speed is severely impacted by the send pauses following retransmissions. There is a bug somewhere, I am not sure to have solved completely, but the patches for the following obvious bugs cut down retransmissions with the associated send pauses from 1-3% to far below 1% as viewed on /peers, totals of Dup Rx column. CPU usage for ACKsender is also down more than 50%. Please test on the LIVE net.

  • OMF puts the ACKs into a set while an ordered list is expected from buildPacket(). This nearly completely disables piggyback ACKing.
  • We keep and send resend ACKs for 5 minutes while they are useless after _rto, wasting CPU, memory and bandwidth.
  • retrieveACKBitfields() does not call removeACKMessage() as required
  • ACKs for dups should be handled with priority, since the other end is stalled.
  • _sendFullyTime should be correctly set, used for router steering to some degree.
  • ACKSender should start ACKing when the other side may have used up 1/3 of its send window.
--- i2p-0.9.42/router/java/src/net/i2p/router/transport/udp/OutboundMessageFragments.java
+++ i2p-0.9.42 patch/router/java/src/net/i2p/router/transport/udp/OutboundMessageFragments.java
@@ -386,7 +386,7 @@
         int piggybackedPartialACK = partialACKBitfields.size();
         // getCurrentFullACKs() already makes a copy, do we need to copy again?
         // YES because buildPacket() now removes them (maybe)
-        Set<Long> remaining = new HashSet<Long>(msgIds);
+        List<Long> remaining = new ArrayList<Long>(msgIds);
 
         // build the list of fragments to send
         List<Fragment> toSend = new ArrayList<Fragment>(8);
@@ -474,11 +474,12 @@
             newFullAckCount = Math.max(0, newFullAckCount - (before - after));
 
             int piggybackedAck = 0;
+            long now = _context.clock().now();
             if (msgIds.size() != remaining.size()) {
                 for (int j = 0; j < msgIds.size(); j++) {
                     Long id = msgIds.get(j);
                     if (!remaining.contains(id)) {
-                        peer.removeACKMessage(id);
+                        peer.removeACKMessage(id, now);
                         piggybackedAck++;
                     }
                 }
--- i2p-0.9.42/router/java/src/net/i2p/router/transport/udp/PeerState.java
+++ /i2p-0.9.42 patch/router/java/src/net/i2p/router/transport/udp/PeerState.java
@@ -339,9 +339,6 @@
     /** for small MTU */
     private static final int MAX_RESEND_ACKS_SMALL = MAX_RESEND_ACKS / 5;
 
-    private static final long RESEND_ACK_TIMEOUT = 5*60*1000;
-
-    
     /**
      *  @param rtt from the EstablishState, or 0 if not available
      */
@@ -860,6 +857,8 @@
         } else {
             //if (true || _retransmissionPeriodStart + 1000 < _context.clock().now()) {
                 _packetsReceivedDuplicate++;
+		// rush dup ACKs
+                _wantACKSendSince = 1;
             //} else {
             //    _retransmissionPeriodStart = _context.clock().now();
             //    _packetsReceivedDuplicate = 1;
@@ -986,7 +985,7 @@
             int sz = _currentACKsResend.size();
             List<Long> randomResends = new ArrayList<Long>(sz);
             if (sz > 0) {
-                long cutoff = _context.clock().now() - RESEND_ACK_TIMEOUT;
+                long cutoff = _context.clock().now() - _rto * 2;
                 int i = 0;
                 for (Iterator<ResendACK> iter = _currentACKsResend.iterator(); iter.hasNext(); ) {
                     ResendACK rack  = iter.next();
@@ -995,7 +994,7 @@
                     } else {
                         iter.remove();
                         if (_log.shouldLog(Log.INFO))
-                            _log.info("Expired ack " + rack.id + " sent " + (cutoff + RESEND_ACK_TIMEOUT - rack.time) +
+                            _log.info("Expired ack " + rack.id + " sent " + (cutoff + _rto * 2 - rack.time) +
                                       " ago, now " + i + " resend acks");
                     }
                 }
@@ -1009,19 +1008,19 @@
      * The ack was sent.
      * Side effect - sets _lastACKSend
      */
-    public void removeACKMessage(Long messageId) {
+    public void removeACKMessage(Long messageId, Long now) {
             boolean removed = _currentACKs.remove(messageId);
             if (removed) {
                 // only add if removed from current, as this may be called for
                 // acks already in _currentACKsResend.
-                _currentACKsResend.offer(new ResendACK(messageId, _context.clock().now()));
+                _currentACKsResend.offer(new ResendACK(messageId, now));
                 // trim happens in getCurrentResendACKs above
                 if (_log.shouldLog(Log.INFO))
                     _log.info("Sent ack " + messageId + " now " + _currentACKs.size() + " current and " +
                               _currentACKsResend.size() + " resend acks");
             }
             // should we only do this if removed?
-            _lastACKSend = _context.clock().now();
+            _lastACKSend = now;
     }
     
     /** 
@@ -1060,18 +1059,17 @@
                 maxResendAcks = MAX_RESEND_ACKS_LARGE;
             List<ACKBitfield> rv = new ArrayList<ACKBitfield>(maxResendAcks);
 
-            // save to add to currentACKsResend later so we don't include twice
-            List<Long> currentACKsRemoved = new ArrayList<Long>(_currentACKs.size());
             // As explained above, we include the acks in any order
             // since we are unlikely to get backed up -
             // just take them using the Set iterator.
             Iterator<Long> iter = _currentACKs.iterator();
+            long now = _context.clock().now();
             while (bytesRemaining >= 4 && iter.hasNext()) {
                 Long val = iter.next();
                 iter.remove();
                 long id = val.longValue();
                 rv.add(new FullACKBitfield(id));
-                currentACKsRemoved.add(val);
+                removeACKMessage(id, now);
                 bytesRemaining -= 4;
             }
             if (_currentACKs.isEmpty())
@@ -1096,13 +1094,6 @@
                         bytesRemaining -= 4;
                     //}
                 }
-                if (!currentACKsRemoved.isEmpty()) {
-                    long now = _context.clock().now();
-                    for (Long val : currentACKsRemoved) {
-                        _currentACKsResend.offer(new ResendACK(val, now));
-                    }
-                    // trim happens in getCurrentResendACKs above
-                }
             }
 
         int partialIncluded = 0;
@@ -1209,6 +1200,7 @@
         
         _consecutiveFailedSends = 0;
         // _lastFailedSendPeriod = -1;
+        long now = _context.clock().now();
         if (numSends < 2) {
             if (_context.random().nextInt(_concurrentMessagesAllowed) <= 0)
                 _concurrentMessagesAllowed++;
@@ -1234,8 +1226,6 @@
         }
         if (_sendWindowBytes > MAX_SEND_WINDOW_BYTES)
             _sendWindowBytes = MAX_SEND_WINDOW_BYTES;
-        _lastReceiveTime = _context.clock().now();
-        _lastSendFullyTime = _lastReceiveTime;
         
         //if (true) {
             if (_sendWindowBytesRemaining + bytesACKed <= _sendWindowBytes)
@@ -1243,6 +1233,9 @@
             else
                 _sendWindowBytesRemaining = _sendWindowBytes;
         //}
+        _lastReceiveTime = now;
+        // to avoid clock jumps would need to get state._startedOn
+        _lastSendFullyTime = Math.max(now - lifetime, _lastSendFullyTime);
         
         if (numSends < 2) {
             // caller synchs
@@ -1442,7 +1435,7 @@
     public boolean unsentACKThresholdReached() {
         //int threshold = countMaxACKData() / 4;
         //return _currentACKs.size() >= threshold;
-        return _currentACKs.size() >= MAX_RESEND_ACKS / 2;
+        return _currentACKs.size() >= DEFAULT_SEND_WINDOW_BYTES / 3 / 1033;
     }
 
     /**

Subtickets

Change History (3)

comment:1 Changed 5 weeks ago by zzz

Milestone: 0.9.430.9.44
Priority: criticalmajor

.43 checkin deadline was yesterday, but we can take a look at this for .44

comment:2 Changed 5 weeks ago by jogger

Maybe here is a reason to do something for .43: Now I found the root cause which is in part mitigated by the patches above, providing better ACKing. Since #2505 we let large messages through even if the conn has not grown its send window to sufficient size in order to avoid starving.

Since we usually fire them into a 1000 ms RTO, we produce lots of dups and retrans delays in the net as most UDP conns are not fast enough for this.

As a quick fix I set MIN_RTO to 3000, allowing for one retry with 6000 ms before the message times out after 10 sec. This brings the retrans rate seen on /stats back down to around 1%.

Alternatively one could set nextSendTime depending on message size, but that would involve reworking the RTO logic which is totally inadequate modelled after TCP, since we meld single and multi-fragment messages, piggyback ACKs and ACKSender delays into one figure.

comment:3 Changed 5 weeks ago by zzz

Sorry, that horse has left the barn. I'll turn my attention to .44 tickets in a couple weeks.

Note: See TracTickets for help on using tickets.