Opened 5 months ago

Closed 5 months ago

Last modified 3 months ago

#2505 closed defect (fixed)

UDP conns starve frequently

Reported by: jogger Owned by: zzz
Priority: blocker Milestone: 0.9.41
Component: router/transport Version: 0.9.40
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

Running stock 0.9.40 with zzz´s getnextvolley patch. I am seeing lots of these repeating for many different conns:

INFO [acket pusher] router.transport.udp.PeerState?: Allocation of 11329 rejected w/ wsize=10159 available=10159 for message 2520735630: OB Message 2520735630 type 19 with 9 fragments of size 11329 volleys: 0 lifetime: 2233536 pending fragments: 0 1 2 3 4 5 6 7 8
INFO [acket pusher] router.transport.udp.PeerState?: Retransmit after choke for next send time in 182ms

INFO [acket pusher] router.transport.udp.PeerState?: Allocation of 11329 rejected w/ wsize=8192 available=8192 for message 2161352553: OB Message 2161352553 type 19 with 9 fragments of size 11329 volleys: 0 lifetime: 124823 pending fragments: 0 1 2 3 4 5 6 7 8
INFO [acket pusher] router.transport.udp.PeerState?: Retransmit after choke for next send time in 213ms

Finally there is a warning that the outbound queue is full:

WARN [P reader ¼] router.transport.udp.PeerState?: Dropping msg, OB queue full for 79.249.237.100:9114 wHe9Hw OB recvAge: 1025 sendAge: 40814 sendAttemptAge: 315 sendACKAge: 291716 lifetime: 2063261 cwin: 8192 acwin: 961 consecFail: 265 msgs rcvd: 7 msgs sent: 59930 pkts rcvd OK/Dup: 3920/0 pkts sent OK/Dup: 60049/99 IBM: 0 OBQ: 512 OBL: 0

I suspect the conn dies afterwards. Of course it must be made sure that messages always fit the congestion window.

Subtickets

Change History (12)

comment:1 Changed 5 months ago by zzz

what's your evidence that anything is wrong? maybe the far-end just went away?

comment:2 Changed 5 months ago by jogger

The evidence is in the OP. allocateSendingBytes() always rejects any message of size > _sendWindowBytes. If such a message is tried to send, it blocks the entire connection immediately, no matter how healthy it may be, because we keep strict sending order. As said the above log entries repeat identically many times, until the message times out. I can see no place in code where in such case the send window would be enlarged or the message splitted afterwards to get it through.

comment:3 Changed 5 months ago by zzz

Milestone: undecided0.9.41
Status: newaccepted

OK, thanks, now I understand the issue better. 11329 is a huge message. We should be able to send some fragments of it when we have 10159 available, but getUnackedSize() is what we have left to send in the message, not the minimum we could send.

So the messages are already "splitted" but the allocation rejects it anyway. I'll work on a fix.

comment:4 Changed 5 months ago by zzz

OK, it's more of a mess than I thought. Messages are of course fragmented. But we don't have code to send only some of the unacked fragments. In OutboundMessageFragments?.preparePackets(), we send ALL of the unacked fragments. That's because OutboundMessageState? only tracks a single nextSendTime for ALL of the fragments, not a separate time for each fragment.

So, PeerState? is correctly using OutboundMessageState?.getUnackedSize() to determine how much will be sent, and accept or reject the bandwidth (window) allocation.

It would be a lot of work to allow partial sending and track times for each.

The quick fix would be to allow the allocation anyway, when the requested allocation > MIN_WINDOW_BYTES and the available window is ≥ MIN_WINDOW_BYTES.

comment:5 Changed 5 months ago by zzz

Priority: majorblocker

Proposed fix, untested, please review and test:

#
# old_revision [da48bda88effba89b87644f8e23fc4ee8dbf6d3f]
#
# patch "router/java/src/net/i2p/router/transport/udp/PeerState.java"
#  from [abaccaf0d9394344fd53bd784b3163355be3c4e7]
#    to [e94b65313dee35946c9429c01a41d29450fc3948]
#
============================================================
--- router/java/src/net/i2p/router/transport/udp/PeerState.java	abaccaf0d9394344fd53bd784b3163355be3c4e7
+++ router/java/src/net/i2p/router/transport/udp/PeerState.java	e94b65313dee35946c9429c01a41d29450fc3948
@@ -717,7 +717,21 @@ public class PeerState {
             //_sendACKBytes = 0;
             _lastSendRefill = now;
         }
-        //if (true) return true;
+
+        // Ticket 2505
+        // We always send all unacked fragments for a message,
+        // because we don't have any mechanism in OutboundMessageFragments
+        // to track the next send time for fragments individually.
+        // Huge messages that are larger than the minimum window size could never
+        // get sent and block the outbound queue forever.
+        // So we let it through anyway, if we have at least the min window remaining.
+        if (size > MINIMUM_WINDOW_BYTES && _sendWindowBytesRemaining >= MINIMUM_WINDOW_BYTES) {
+            _sendWindowBytesRemaining = Math.max(_sendWindowBytesRemaining - size, 0); 
+            _sendBytes += size;
+            _lastSendTime = now;
+            return true;
+        }
+
         if (IGNORE_CWIN || size <= _sendWindowBytesRemaining || (ALWAYS_ALLOW_FIRST_PUSH && messagePushCount == 0)) {
             if ( (messagePushCount == 0) && (_concurrentMessagesActive > _concurrentMessagesAllowed) ) {
                 _consecutiveRejections++;

This could help bote, that I think likes to send large DHT messages. Also iMule, if anybody's still using it.

comment:6 Changed 5 months ago by jogger

Had a short look. Error is gone, connections and tunnels build up fast. The Math.max() can simply be replaced by 0.

MINIMUM_WINDOW_BYTES should be replaced by _sendWindowBytes, otherwise we defeat the _consecutiveRejections logic below.

That logic however seems to be too restrictive, blocking traffic for 512K window when 512K available.

FYI: If large DHT messages caused this, will also help BiglyBT.

comment:7 in reply to:  6 Changed 5 months ago by zzz

Replying to jogger:

Had a short look. Error is gone, connections and tunnels build up fast.

thanks for testing!

The Math.max() can simply be replaced by 0.

No, size may still be less than _sendWindowBytesRemaining so we still need to subtract and do Math.max()

MINIMUM_WINDOW_BYTES should be replaced by _sendWindowBytes, otherwise we defeat the _consecutiveRejections logic below.

can we just set _consecutiveRejections to 0 in the new code block?

That logic however seems to be too restrictive, blocking traffic for 512K window when 512K available.

yeah would rather fix it as above.

FYI: If large DHT messages caused this, will also help BiglyBT.

standard bittorrent DHT messages are much smaller than 8 KB, which is the threshold to hit this bug. If it did, it would hit i2psnark equally. For example for the messages listed here http://i2p-projekt.i2p/en/docs/applications/bittorrent it would take 256 32-byte hashes in a message to hit 8 KB, and I think we limit things to much less than that.

I do know that bote likes to send big ones. Maybe iMule also, if anybody uses that anymore. Some nonstandard biglyBT messages may be large, I don't know. Do you know which biglyBT applications do? In any case, this is only for I2P datagrams. Streaming of course breaks up the data into about 2 KB chunks.

comment:8 Changed 5 months ago by jogger

My Math.max() comment referred to use of _sendWindowBytes, otherwise you are correct.

_consecutiveRejections is just a stats counter. In fact we totally bypass (messagePushCount == 0) && (_concurrentMessagesActive > _concurrentMessagesAllowed). This logic actually strikes every second or more often. _concurrentMessagesAllowed is random number driven what I dislike but always ≥8.

So the decision is about to a) abandon limiting concurrent messages or b) only bypass in this special case as you proposed or c) keep and instead of your patch change the if statement:

         if (IGNORE_CWIN || size <= _sendWindowBytesRemaining || size > _sendWindowBytes || (ALWAYS_ALLOW_FIRST_PUSH && messagePushCount == 0)) {

Never looked into BiglyBT DHT. But with this mod it just takes one minute to get multiple BiglyBT instances up to full speed.

Last edited 5 months ago by jogger (previous) (diff)

comment:9 in reply to:  6 Changed 5 months ago by zzz

I wanted to make it a separate if() instead of burying it in the existing if as you propose, to make it clear what the code is doing and that it's a workaround.

But I see what you're saying about bypassing the concurrent messages check.
I also like checking against _sendWindowBytes, not MIN_WINDOW_SIZE.
I simplified the code by getting rid of the ancient debug things set to false.
Also commented some of the methods while I was looking at them.

The main difference from your proposed change is that I'm also checking that the window is "empty", i.e. _sendWindowBytesRemaining ≥ _sendWindowBytes

see what you think about it:

#
# old_revision [a0d27ac821ab7776411f0bd1dd718a9dced67614]
#
# patch "router/java/src/net/i2p/router/transport/udp/PeerState.java"
#  from [abaccaf0d9394344fd53bd784b3163355be3c4e7]
#    to [2e2977ae95ff8c0de896cb4e068bec84a5ebeb83]
#
============================================================
--- router/java/src/net/i2p/router/transport/udp/PeerState.java	abaccaf0d9394344fd53bd784b3163355be3c4e7
+++ router/java/src/net/i2p/router/transport/udp/PeerState.java	2e2977ae95ff8c0de896cb4e068bec84a5ebeb83
@@ -478,7 +478,11 @@ public class PeerState {
      */
     //public boolean getRemoteWantsPreviousACKs() { return _remoteWantsPreviousACKs; }
 
-    /** how many bytes should we send to the peer in a second */
+    /**
+     *  how many bytes should we send to the peer in a second
+     *  1st stat in CWND column, otherwise unused,
+     *  candidate for removal
+     */
     public int getSendWindowBytes() {
         synchronized(_outboundMessages) {
             return _sendWindowBytes;
@@ -682,11 +686,6 @@ public class PeerState {
      */
     //public void remoteDoesNotWantPreviousACKs() { _remoteWantsPreviousACKs = false; }
     
-    /** should we ignore the peer state's congestion window, and let anything through? */
-    private static final boolean IGNORE_CWIN = false;
-    /** should we ignore the congestion window on the first push of every message? */
-    private static final boolean ALWAYS_ALLOW_FIRST_PUSH = false;
-    
     /** 
      * Decrement the remaining bytes in the current period's window,
      * returning true if the full size can be decremented, false if it
@@ -717,8 +716,16 @@ public class PeerState {
             //_sendACKBytes = 0;
             _lastSendRefill = now;
         }
-        //if (true) return true;
-        if (IGNORE_CWIN || size <= _sendWindowBytesRemaining || (ALWAYS_ALLOW_FIRST_PUSH && messagePushCount == 0)) {
+
+        // Ticket 2505
+        // We always send all unacked fragments for a message,
+        // because we don't have any mechanism in OutboundMessageFragments
+        // to track the next send time for fragments individually.
+        // Huge messages that are larger than the window size could never
+        // get sent and block the outbound queue forever.
+        // So we let it through when the window is empty (full window remaining).
+        if (size <= _sendWindowBytesRemaining ||
+            (size > _sendWindowBytes && _sendWindowBytesRemaining >= _sendWindowBytes)) {
             if ( (messagePushCount == 0) && (_concurrentMessagesActive > _concurrentMessagesAllowed) ) {
                 _consecutiveRejections++;
                 _context.statManager().addRateData("udp.rejectConcurrentActive", _concurrentMessagesActive, _consecutiveRejections);
@@ -731,6 +738,8 @@ public class PeerState {
                 _consecutiveRejections = 0;
             }
             _sendWindowBytesRemaining -= size; 
+            if (_sendWindowBytesRemaining < 0)
+                _sendWindowBytesRemaining = 0; 
             _sendBytes += size;
             _lastSendTime = now;
             //if (isForACK) 
@@ -767,20 +776,36 @@ public class PeerState {
     }
   ****/
 
+    /**
+     *  stat in SST column, otherwise unused,
+     *  candidate for removal
+     */
     public int getSlowStartThreshold() { return _slowStartThreshold; }
 
+    /**
+     *  2nd stat in CWND column, otherwise unused,
+     *  candidate for removal
+     */
     public int getConcurrentSends() {
         synchronized(_outboundMessages) {
             return _concurrentMessagesActive;
         }
     }
 
+    /**
+     *  3rd stat in CWND column, otherwise unused,
+     *  candidate for removal
+     */
     public int getConcurrentSendWindow() {
         synchronized(_outboundMessages) {
             return _concurrentMessagesAllowed;
         }
     }
 
+    /**
+     *  4th stat in CWND column, otherwise unused,
+     *  candidate for removal
+     */
     public int getConsecutiveSendRejections() {
         synchronized(_outboundMessages) {
             return _consecutiveRejections;

comment:10 Changed 5 months ago by zzz

bump, you may have missed the above as trac emails were broken at the time…

comment:11 Changed 5 months ago by zzz

Resolution: fixed
Status: acceptedclosed

comment:12 Changed 3 months ago by zzz

Sensitive: unset

The concurrent messages window check discussed in comments 8 and 9 moved to separate tickets #2512 and #2576.

Note: See TracTickets for help on using tickets.