Opened 6 years ago

Last modified 3 years ago

#689 open defect

NTCP limit inbound queues

Reported by: zzz Owned by: zzz
Priority: minor Milestone:
Component: router/transport Version: 0.9.1
Keywords: Cc: zab@…
Parent Tickets:

Description

per-con NTCPConnection _readbufs and global Reader _pendingConnections.

Can't block in NTCPConnection.recv() or it blocks the whole pumper.

From IRC:

<zab_> zzz is there any chance I could convince you to apply the same queue limits you introduced to SSU in -4 to NTCP? It would be based entirely on intuition since aargh seems awol
<zab_> the method is NTCPConnection.recv(ByteBuffer?)

<zzz> zab_work it's not so simple, we can't do it like SSU...
<zzz> for both the per-conn queue in NTCPConnection.recv() and the global queue in Reader, we can't block...
<zzz> we have to push back by disabling the KeySelector?'s interest ops...
<zzz> Agreed it should be fixed, but in the absence of identified serious problems, I'd rather not tackle it this cycle

<iRelay> <zab_work@freenode> ok, should be easy to reproduce when KillYourTV becomes update host
<iRelay> <zab_work@freenode> if it is a problem in the first place, that is
<iRelay> <zab_work@freenode> I'd still like if some busy router could experiment with naive blocking approach just to see how it impacts throughput, ram & object counts
<zzz> blocking may be too fugly to get good data from though... as it blocks all 4 NIO operations globally (reads, writes, accepts, connections)

<zzz> zab_work I'm going to open a NTCP ticket and leave the tough analysis for another day, too much going on atm
<iRelay> <zab_work@freenode> thought #674 was just that?
<zzz> oh, ok, will update it
<zzz> no wait, that's the byte buffer cache
<zzz> not the receive queues

Subtickets (add)

Change History (8)

comment:1 Changed 6 years ago by zab

Writing down some notes so they don't get lost. The proactor pattern makes a pushback to the interest ops complicated:

  • The NTCP Reader thread needs to remember to turn interest back on after it drains the q.
  • Changing the interest locks the same monitors that are locked during select(), so the Reader needs to enqueue a "turn interest on" request and wakeup() the selector.
  • It is not acceptable to let the reader just block until the select() call returns because due to tcp pushback this will essentially kill throughput. This may be unclear, but step through it and it will make sense.
  • Implementing "interest requests" requires adding additional logic to the pumper event loop
  • Add in the various close and cleanup states and the number of execution paths explodes...

(p.s. the proper term for "execution path" is "linearization")

comment:2 Changed 6 years ago by zab

There may be a much simpler way to do this - have the pumper thread check the queue sizes of all "choked" connections and turn interest back on itself. It already does something like this but for writing only, we just need to make it do it for reading & connecting as well.

The SELECTOR_LOOP_DELAY may have to be reduced as to not kill throughput, but this sounds promising.

comment:3 Changed 6 years ago by zab

The more I play around with #697 the more I'm convinced that a complete redesign of the NTCP engine will have a huge impact on throughput. Every time something needs to be passed between threads we're talking hundreds of microseconds delay at least. Make that orders of magnitude more if there aren't any cpu cores available which may often be the case when running in virtual machines. Even a 500 microsecond delay will have noticeable impact on throughput on fiber-to-the-home or datacenter deployments.

comment:4 Changed 6 years ago by zab

Some "corporate" names and descriptions for various patterns:

Proactor the existing one pt.1, (sort of)
Half-sync half-async the existing one pt.2, (sort of)
Reactor much simpler, definitely best for mobile & low-power devices
Leader-Followers may be the best choice when combined with Reactor

comment:5 Changed 6 years ago by zab

  • Cc zab removed

comment:6 Changed 6 years ago by zab

  • Cc zab@… added

comment:7 Changed 6 years ago by str4d

  • Milestone 0.9.3 deleted

comment:8 Changed 3 years ago by str4d

  • Status changed from new to open
Note: See TracTickets for help on using tickets.