Opened 22 months ago

Last modified 21 months ago

#2251 accepted enhancement

Remove sleep from EventPumper

Reported by: Zlatin Balevsky Owned by: zzz
Priority: minor Milestone: 0.9.36
Component: router/transport Version: 0.9.34
Keywords: ntcp nio Cc:
Parent Tickets: Sensitive: no

Description (last modified by zzz)

Just opening a discussion on this one. I've been running a router with patches from #2247 + removed sleep() for a few hours with YourKit profiling enabled on OpenJDK 8 and cpu usage looks good. Bandwidth usage also looks higher than normal, but that could be due to it being Friday night, so I will wait for a few days and post a graph.


Attachments (1)

bw-no-sleep.png (677.6 KB) - added by Zlatin Balevsky 22 months ago.
BW for 10 days with patches from 2247 and no sleep()

Download all attachments as: .zip

Change History (9)

comment:1 Changed 22 months ago by Zlatin Balevsky

Er, I meant ticket #2247

comment:2 Changed 22 months ago by zzz

Description: modified (diff)
Milestone: undecided0.9.36

comment:3 Changed 22 months ago by zzz

We put that sleep in there out of desperation, but it is effective.

Unfortunately, we have no easy way to know or find out if 100% CPU issues have been eliminated on all platforms, for all possible JREs.

Perhaps we need a better way to determine if a sleep is necessary, perhaps by counting consecutive loops with no valid keys. We'd have to minimize pointless wakeup()s first.

Following probably applies to #2247 but putting it here re: minimizing wakeup()s:

In Jetty outside the selector (see their SelectChannelEndPoint?.java) they have a local AtomicInteger? for wanted local interest ops. Then they compareAndSet, and if different, queue the change to the selector.

Then in selector thread (see their ManagedSelector?.java) they have a state with an AtomicReference? for a state enum. They compareAndSet that, and then wakeup if changed.

Changed 22 months ago by Zlatin Balevsky

Attachment: bw-no-sleep.png added

BW for 10 days with patches from 2247 and no sleep()

comment:4 Changed 22 months ago by Zlatin Balevsky

Ok, if sleep() is here to stay I'm going to stop my experiment (but doesn't that bandwidth graph look just beautiful?) and move on to #2252

comment:5 Changed 21 months ago by zzz

survey of throttles per day:
medium-speed home router: 2
medium-fast non-ff: 30
fast ff: 3000
so the sleep isn't so harmless on a big router.

obviously loops in 2 sec will go up with more keys.
changing threshold to increase with increased keys, testing now

comment:6 Changed 21 months ago by Zlatin Balevsky

I didn't count the actual events on my busy router, but from YourKit profiling over a period of 60 seconds, 1.5 seconds were spent in sleep = 1500/25 = 60 such events in that one minute period.

Here's another thought - why don't we declare some platform combinations "fixed" and remove the sleep just for those? For example (x86, OpenJDK/OracleJDK, v1.7+) can be one such combination. It doesn't need to be an exhaustive list but as long as we can cover the most common ones it will still be impactful.

comment:7 Changed 21 months ago by zzz

Status: newaccepted

Existing stat is ntcp.failsafeThrottle
Threshold changed as in comment 5 in ca3e097fd6f16a2c304ee67a679f87af96e918ff 0.9.35-13 to increases the limit from 512 to max(512, 2*connections)
Test results: reduced throttles-per-day to zero for the fast ff; no significant change for medium and medium-fast routers.
Leaving open for additional criteria, as proposed in comment 6

comment:8 Changed 21 months ago by Zlatin Balevsky

Here is a patch that disables selector sleep on { x86_64, (Oracle | Open)JDK, Java7+) platform combo


Note: See TracTickets for help on using tickets.