Opened 7 months ago

Last modified 7 months ago

#2255 new defect

NTCP _currentOutbound

Reported by: zzz Owned by: zzz
Priority: minor Milestone: 0.9.36
Component: router/transport Version: 0.9.34
Keywords: Cc: zab
Parent Tickets:


NTCPConnection allows only one outbound message (OutNetMessage?) in flight at once, stored in _currentOutbound.

In prepareNextWriteFast() if this field is set, we just return. This happens a LOT.

In removeWriteBuf() the field is cleared. The logic is wonky to try to make sure that it's only cleared if it wasn't a Meta message (timestamp).

I have at least some evidence from tooBacklogged() that things get stuck sometimes. Not sure if bug or if the TCP really is throttled.

I have local mods to do more logging for further investigation.

Also, my local NTCP2 mods change _currentOutbound to List<OutNetMessage?> so we may put multiple I2NP messages in a single outbound payload for NTCP2. But this isn't a true fix for the above issues because it's still one outbound buffer in flight.

Current situation is wonky and inefficient and prone to errors. We need a callback from the pumper that calls transport.sendComplete() and updates stats, but for the particular I2NP messages sent, for that buffer. If we can.

Can't be fixed until NTCP2 is propped because conflicts. But could be investigated now, and I've started, since I'm staring at NTCPConnection logs all day anyway.

Subtickets (add)

Change History (7)

comment:1 Changed 7 months ago by zzz

To be clear, we probably do want to continue to have _outbound be our main queue. You can't push everything to the pumper right away. But we need to do it efficiently and not let _currentOutbound get stuck.

Implicit in, or a byproduct of, our current code is that things get round-robined across connections. Especially on the outbound side, we must be careful that we have fair usage across connections when outbound is constrained. A quest for efficiency could break that. Should be that way now, but I haven't looked at it in a long long time. This goes for #2247 #2252 also.

comment:2 Changed 7 months ago by zzz

  • Milestone changed from undecided to 0.9.36

Even if nothing W.R.T. the OP changes, there's an opportunity to batch outbound messages for NTCP 1 using the NTCP 2 mechanism (_currentOutbound as a List) so setting 36 as the target for at least that.

comment:3 Changed 7 months ago by zab

Implicit in, or a byproduct of, our current code is that things get round-robined across connections. Especially on the outbound side, we must be careful that we have fair usage across connections when outbound is constrained.

That is semi-explicitly (heh) achieved by having a Writer._pendingConnections be a LinkedHashSet. Complete fairness is really up to the OS and how Selectors operate.

comment:4 Changed 7 months ago by zab

I think I have a hang on why the "multiple outbound messages" is happening:

  1. Some thread calls Writer.wantsWrite and puts the connection in _pendingConnections and notify()-es a Writer thread.
  2. Writer thread wakes up, and moves the connection from _pendingConnections to _liveWrites and proceeds to do the write preparation, setting _currentOutbound to whatever came out of the queue
  3. Another thread calls Writer.wantsWrite with the same connection, but because it is already in _liveWrites it goes into the _liveAfterWrite collection. The monitor is notify()-ed, HOWEVER:
  4. The same writer thread acquires the monitor, sees that the same connection is in _writeAfterLive and continues to write on it, calling prepareNextWrite on the same connection while _currentOutbound is still set.

Note that after step 2 the writer thread will have called selector.wakeup() but there is no guarantee the pumper will manage to send the message in time for steps 3 and 4 to happen.

comment:5 Changed 7 months ago by zzz

I think NTCP2 is exposing problems here. For outbound, we save a RTT and so I2NP can go out right after msg 3. But _currentOutbound isn't set by msg 3, I think we might be getting things out of order, or clearing _currentOutbound after msg 3, or both. This didn't happen in NTCP1 because we had to wait for msg 4 to come back.

For further investigation. Either a callback or attachment to the ByteBuffer? or another object wrapping the ByteBuffer? so we know what to do after completion in the pumper could be necessary.

Re: fairness, ok, just something to keep in mind that we need to maintain round-robin on our side.

comment:6 Changed 7 months ago by zzz

so instead of passing a byte[] from NTCPConnection to pumper in wantsWrite() and then NTCPConnection queueing a ByteBuffer? in _writeBufs, we'd pass back and forth an object with the buffer and a callback, something like this maybe:

interface WriteBuf {
    public ByteBuffer getBuffer();
    public void sendComplete(boolean success);

comment:7 Changed 7 months ago by zzz

I'm going to try something simpler first, where we don't clear _currentOutbound while we're in the establishment phase.

Note: See TracTickets for help on using tickets.