Opened 3 weeks ago

Last modified 4 days ago

#2576 new enhancement

Do not use message-based window in SSU

Reported by: Zlatin Balevsky Owned by: zzz
Priority: minor Milestone: undecided
Component: router/transport Version: 0.9.41
Keywords: ssu Cc: jogger
Parent Tickets: #2512 Sensitive: no

Description

Right now SSU keeps track of two windows - one is based on number of messages, another on number of bytes. The window for messages gets incremented in the following way:

if (_context.random().nextInt(_concurrentMessagesAllowed) <= 0)
    _concurrentMessagesAllowed++;

I'm not sure I understand the reasoning behind this logic but testing suggests that this limits SSU transmissions very often. I'm seeing frequent log statements where allocation fails while there is plenty of room in the byte-based window:

2019/07/31 06:47:07.245 INFO  [acket pusher] router.transport.udp.PeerState: Allocation of 1033 rejected w/ wsize=527024 available=446450 for message 1034903768: OB Message 1034903768 type 18 with 1 fragments of size 1033 volleys: 0 lifetime: 0 pending fragments: 0 

whereas rejection by the byte window happens much less often. Also, disabling this window (by a quick hack of setting MIN_CONCURRENT_MSGS to some huge value) increases overall throughput greatly on my testnet.

I don't know why there are two windows; shouldn't one be enough?

Subtickets

Attachments (3)

MIN_CONCURRENT_MSGS_analysis.ods (15.8 KB) - added by Zlatin Balevsky 2 weeks ago.
MIN_CONCURRENT_MSGS_analysis2.ods (14.7 KB) - added by Zlatin Balevsky 8 days ago.
Repeated the experiment with delay of 75ms and NTCP disabled. I can't expain why 4 performs better than 8; the big jump occurs somewhere between 16 and 32.
MIN_CONCURRENT_MSGS_analysis3.ods (16.2 KB) - added by Zlatin Balevsky 8 days ago.
Zoom-in on 16-32 range

Download all attachments as: .zip

Change History (21)

comment:1 Changed 3 weeks ago by zzz

Parent Tickets: 2512

I came upon this a couple months ago, see discussion in #2512 #2505

comment:2 Changed 3 weeks ago by zzz

The if (random(allowed)) allowed++ logic is standard jrandom code to simulate AIMD. On average, the window will be incremented once for every (window) messages.

jogger said in #2505 that he sees this rejection once per second, but then in #2512 comment 1 he says he never sees it on the "good" ISP, only the "bad" one, so I don't know what to make of that.

A reasonable tweak would be to increase the initial value to twice the minimum, i.e.:

private int _concurrentMessagesAllowed = MIN_CONCURRENT_MSGS * 2;

I'll test that soon.

comment:3 Changed 3 weeks ago by Zlatin Balevsky

I'll test changing the constant in a similar fashion to #2586 after I'm done with #2574 and upload a spreadsheet. ETA 24hrs

comment:4 Changed 3 weeks ago by zzz

with the initial upped to 16, I'm seeing about 200 rejections an hour on a medium-speed floodfill. Don't have stats from before. so about one every 20 seconds now.

The good thing about having init > min is that we can see when it's being driven down by failure. I see a small percentage at less than 16, mostly 14 and 15, a few 12's, and the min looks like 9. So that may be an indication that 16 mostly works?

comment:5 Changed 3 weeks ago by Zlatin Balevsky

Depends on the meaning of "mostly working". In a way 8 mostly works as well, however every rejection means the connection in question stalls until an ack arrives from the other side.

My question is why do we need two separate windows, I don't understand what is achieved with this.

comment:6 Changed 3 weeks ago by zzz

Right. I have the same question, and I stated that in my ticket #2512. So let's try to answer that question.

I have researched history.txt, website docs, mtn logs, meeting logs, and status notes… here's what I found:

SSU work started Apr. 2005
SSU enabled as highest-priority transport July 2005
concurrentmessages code added Feb. 2006 with this comment:

2006-02-18  jrandom
    * Add a new AIMD throttle in SSU to control the number of concurrent
      messages being sent to a given peer, in addition to the throttle on the
      number of concurrent bytes to that peer.
    * Adjust the existing SSU outbound queue to throttle based on the queue's
      lag, not an arbitrary number of packets.

So it's very much on purpose, and was added after several months of live-net experience. I didn't find any other info in the change, which happened at the same time as a flag day change to tunnel building (0.6.1.10, which was the last incompatible change ever made to the net). The tunnel changes were dominating the status notes and meetings around that time.

So we have to speculate on the "why". The obvious guess is that he was seeing drops out in the network that appeared to be based on datagram (packet) count, not just bandwidth, and could be stimulated by sending lots of packets. This would be the case if network equipment drops packets based on packet count. That's a pretty reasonable assumption - routers have queues of packets, the packets get sent out based on available bandwidth, but the queues could overflow with lots of small packets also.

However, PeerState? isn't counting packets (acks + fragmented messages). It's counting whole messages only. So that's a poor proxy for packets. Cound he have counted how many fragments are still to send for a message and used that? sure… iterate through OutboundMessageState?.needsSending(i) or add a method to return a count.

Maybe what he was doing, from the checkin comments, was moving the packet limiting from the outbound queue to s separate mechanism?

I'd rather not rip out code we don't understand. So I'm in favor of continuing the research, and for now, raising the limit if it seems appropriate.

Changed 2 weeks ago by Zlatin Balevsky

comment:7 Changed 2 weeks ago by jogger

Found two more quirks with this:

The actual limiting takes places in allocateSendingBytes():

if ( (messagePushCount == 0) && (_concurrentMessagesActive > _concurrentMessagesAllowed) ) {

First, why does it not read "≥" ?

Also, by the time we get here, we have notified packet pusher, done a round robin, updated the send window, updated stats and then break out of the whole thing.

A correct solution would be to move the messagePushCount param to getSendWindowBytesRemaining() and start with

if ( (messagePushCount == 0) && (_concurrentMessagesActive >= _concurrentMessagesAllowed) )
   return 0;

comment:8 Changed 8 days ago by zzz

@zab comments on your spreadsheet? higher is better, 8 is best, but it's relatively insensitive to change, if I'm reading it right?

@jogger all the important spots are using _sendWindowBytesRemaining field, not calling sendWindowBytesRemaining(), so we cant just move the code there without additional changes

comment:9 Changed 8 days ago by Zlatin Balevsky

The readings in the spreadsheet are without any artificial delay or loss. I will repeat the experiment against current mtn head and with delay of 75ms at each node.

comment:10 Changed 8 days ago by jogger

It happens infrequently and will get less if the minimum is raised. My point : If we know before that we will block the message, why can´t we do it before packet pusher is notified through add() and a possibly useless round robin starts?

comment:11 Changed 8 days ago by zzz

@OP @jogger looking for reaction to the issues raised in comment 6. If we keep this limit, should it be changed to a packet (fragment) instead of message count? To what extent does network equipment throttle or overflow or drop based on UDP packets in addition to (or instead of) bytes? What are the best practices here for a UDP transport? What are your guesses on why this change was made in early 2006?

Changed 8 days ago by Zlatin Balevsky

Repeated the experiment with delay of 75ms and NTCP disabled. I can't expain why 4 performs better than 8; the big jump occurs somewhere between 16 and 32.

Changed 8 days ago by Zlatin Balevsky

Zoom-in on 16-32 range

comment:12 Changed 8 days ago by Zlatin Balevsky

See the third attachment for a zoom in on the 16-32 range of values for this variable. There is a sharp jump in throughput at 20 followed by another at 24, so I think those two are worth examining in the live net similar to how it was done in comment 4.

Regarding the other questions, I have to guess that network equipment maintains it's queues per packet, not per bytes. So if we are considering getting rid of one of the windows we should probably get rid of the bytes window.

comment:13 Changed 7 days ago by zzz

@jogger re: comment 10, not disagreeing with your point, but it's not as simple as the change in comment 7

@OP re: comment 12, we need to do more than guess, so time to do some research. Note as I said in comment 6, if we're going to count packets, that means fragments not messages.

I propose to try 16 or 20 for the initial _conncurrentMessagesAllowed? And leave other changes to after .42?

comment:14 in reply to:  13 Changed 7 days ago by Zlatin Balevsky

Replying to zzz:

@OP re: comment 12, we need to do more than guess, so time to do some research. Note as I said in comment 6, if we're going to count packets, that means fragments not messages.

There has to be some kind of per-packet queue otherwise the equipment would not be able to do QoS.

I propose to try 16 or 20 for the initial _conncurrentMessagesAllowed? And leave other changes to after .42?

Sounds good to me, would be nice to have a study of 8 vs 16 vs 20 in a similar fashion to what you did in comment 4

comment:15 Changed 6 days ago by zzz

Final stats with init=16, 11 day test:

rejectConcurrentActive, 92 per 10 minutes, 1 every 6 seconds
allowConcurrentActive, 22,315 per 10 minutes
rejecting 0.4% of transmissions
vast majority of windows are 14-22
highest: 208 on a connection up 6 days
lowest: four 9's and two 8's out of 993 peers

will retest with 20

comment:16 Changed 6 days ago by zzz

with 20:

rejectConcurrent / (reject+allow) = 0.19% (was 0.4% for 16)

congestionOccurred 2.65% (was 2.88% for 16)

most conns are 17-24

will retest with 8

Last edited 4 days ago by zzz (previous) (diff)

comment:17 Changed 4 days ago by zzz

with 8:

rejectConcurrent 0.39%
congestionOccurred 2.39%

corrected congestion% for 16 in comment 16

most conns are 8-17
highest seen 157

Last edited 4 days ago by zzz (previous) (diff)

comment:18 Changed 4 days ago by zzz

Based on above tests, changed to 20 in f0b959c74f3dfefa6cd557887c8456158f4e58b5 0.9.41-10-rc.
Leaving open for further research and changes for .43 or later.

Last edited 4 days ago by zzz (previous) (diff)
Note: See TracTickets for help on using tickets.