Opened 9 years ago

Closed 7 years ago

#772 closed defect (fixed)

Massive number of duplicate acks

Reported by: Zlatin Balevsky Owned by: zzz
Priority: minor Milestone: 0.9.17
Component: router/transport Version: 0.9.3
Keywords: SSU Cc: Zlatin Balevsky
Parent Tickets: Sensitive: no


While studying various SSU-related issues I decided to count the duplicate acks.

--- router/java/src/net/i2p/router/transport/udp/ 4b2316e6e9f7235ba935861ff18b64e7cc03ff16
+++ router/java/src/net/i2p/router/transport/udp/ 8e730d2b4990abe09b9919ad7a9f0b47434a8537
@@ -1779,6 +1779,7 @@ class PeerState {
         } else {
             // dupack, likely
+            _context.statManager().updateFrequency("udp.duplicateAck");
             //if (_log.shouldLog(Log.DEBUG))
             //    _log.debug("Received an ACK for a message not pending: " + messageId);
@@ -1861,6 +1862,7 @@ class PeerState {
             return isComplete;
         } else {
             // dupack
+            _context.statManager().updateFrequency("udp.duplicateAck");
             if (_log.shouldLog(Log.DEBUG))
                 _log.debug("Received an ACK for a message not pending: " + bitfield);
             return false;

I'm getting some scary readings.

5 sec frequency: 14 ms; Rolling average events per period: 370.00; Highest events per period: 1,152.14; Lifetime average events per period: 4,686.34
60 sec frequency: 13 ms; Rolling average events per period: 4,799.41; Highest events per period: 10,056.22; Lifetime average events per period: 56,236.10
10 min frequency: 11 ms; Rolling average events per period: 53,833.81; Highest events per period: 82,005.46; Lifetime average events per period: 562,361.03
Lifetime average frequency: 1 ms (5,393,823 events)

The router is busy, ~1500 SSU peers, but my intuition says this level of duplicate acks cannot be explained by genuinely duplicate datagrams. I'll keep investigating and update the issue.


Change History (6)

comment:1 Changed 9 years ago by Zlatin Balevsky

Cc: Zlatin Balevsky added

Clarification: "genuinely duplicate datagrams" == "Alice sends one datagram, Bob receives two".

comment:2 Changed 9 years ago by zzz

SSU sends massive numbers of dup acks by design, no reason to panic. As sequence numbers are random, one ack doesn't imply an ack of previous packets, so you can never be sure an ack got through.

Ref: http://www.i2p2.i2p/udp

So it isn't broken. But that doesn't mean it isn't worth researching. The algorithm for which dup acks to send - and how many, how often, and when - is rather unsophisticated. For example I don't know if we always resend an ack if we get a retransmitted packet. And the acks we do resend are chosen randomly out of the list of recently-sent ones. If the window size gets big the chance of a packet getting acked mutliple times goes down. I think.

And at a higher level, moving to some non-random (i.e. more TCP-like) scheme for sequence numbers may make for a more rational and efficient protocol.

comment:3 Changed 9 years ago by zzz

you can never be sure that an ack got through

Not true, if you get an ack for a packet before you retransmit it, and that packet contained acks (new or dup).

But we don't track that now.

comment:4 Changed 7 years ago by zzz

The queue of already-sent acks doesn't have a timestamp, so we can send acks for minutes or even hours. We should timestamp and expire them.

_currentACKsResend in PeerState?.java

comment:5 Changed 7 years ago by zzz

Milestone: 0.9.18
Status: newaccepted

Working on ack timestamps in test2 branch for 0.9.17.

Next idea: ACK send tracking. We don't track what acks are sent with what data. If we did, and that data was acked after the first send, we know he got the acks, and we can remove them from the resend queue.

comment:6 Changed 7 years ago by zzz

Resolution: fixed
Status: acceptedclosed

The fixes in 0.9.17 should help a lot. The "next idea" above may be more trouble than it's worth. If there's more to do I'll open up a new ticket.

Note: See TracTickets for help on using tickets.