Opened 7 months ago

Last modified 6 months ago

#2251 accepted enhancement

Remove sleep from EventPumper

Reported by: zab Owned by: zzz
Priority: minor Milestone: 0.9.36
Component: router/transport Version: 0.9.34
Keywords: ntcp nio Cc:
Parent Tickets:

Description (last modified by zzz)

Just opening a discussion on this one. I've been running a router with patches from #2247 + removed sleep() for a few hours with YourKit profiling enabled on OpenJDK 8 and cpu usage looks good. Bandwidth usage also looks higher than normal, but that could be due to it being Friday night, so I will wait for a few days and post a graph.

Subtickets (add)

Attachments (1)

bw-no-sleep.png (677.6 KB) - added by zab 7 months ago.
BW for 10 days with patches from 2247 and no sleep()

Download all attachments as: .zip

Change History (9)

comment:1 Changed 7 months ago by zab

Er, I meant ticket #2247

comment:2 Changed 7 months ago by zzz

  • Description modified (diff)
  • Milestone changed from undecided to 0.9.36

comment:3 Changed 7 months ago by zzz

We put that sleep in there out of desperation, but it is effective.

Unfortunately, we have no easy way to know or find out if 100% CPU issues have been eliminated on all platforms, for all possible JREs.

Perhaps we need a better way to determine if a sleep is necessary, perhaps by counting consecutive loops with no valid keys. We'd have to minimize pointless wakeup()s first.

Following probably applies to #2247 but putting it here re: minimizing wakeup()s:

In Jetty outside the selector (see their SelectChannelEndPoint?.java) they have a local AtomicInteger? for wanted local interest ops. Then they compareAndSet, and if different, queue the change to the selector.

Then in selector thread (see their ManagedSelector?.java) they have a state with an AtomicReference? for a state enum. They compareAndSet that, and then wakeup if changed.

Changed 7 months ago by zab

BW for 10 days with patches from 2247 and no sleep()

comment:4 Changed 7 months ago by zab

Ok, if sleep() is here to stay I'm going to stop my experiment (but doesn't that bandwidth graph look just beautiful?) and move on to #2252

comment:5 Changed 6 months ago by zzz

survey of throttles per day:
medium-speed home router: 2
medium-fast non-ff: 30
fast ff: 3000
so the sleep isn't so harmless on a big router.

obviously loops in 2 sec will go up with more keys.
changing threshold to increase with increased keys, testing now

comment:6 Changed 6 months ago by zab

I didn't count the actual events on my busy router, but from YourKit profiling over a period of 60 seconds, 1.5 seconds were spent in sleep = 1500/25 = 60 such events in that one minute period.

Here's another thought - why don't we declare some platform combinations "fixed" and remove the sleep just for those? For example (x86, OpenJDK/OracleJDK, v1.7+) can be one such combination. It doesn't need to be an exhaustive list but as long as we can cover the most common ones it will still be impactful.

comment:7 Changed 6 months ago by zzz

  • Status changed from new to accepted

Existing stat is ntcp.failsafeThrottle
Threshold changed as in comment 5 in ca3e097fd6f16a2c304ee67a679f87af96e918ff 0.9.35-13 to increases the limit from 512 to max(512, 2*connections)
Test results: reduced throttles-per-day to zero for the fast ff; no significant change for medium and medium-fast routers.
Leaving open for additional criteria, as proposed in comment 6

comment:8 Changed 6 months ago by zab

Here is a patch that disables selector sleep on { x86_64, (Oracle | Open)JDK, Java7+) platform combo


Note: See TracTickets for help on using tickets.