Opened 3 weeks ago

Last modified 18 hours ago

#2355 new defect

Single thread goes wild on ARM

Reported by: jogger Owned by:
Priority: major Milestone: undecided
Component: router/general Version: 0.9.37
Keywords: Cc:
Parent Tickets:

Description (last modified by zzz)

This identifies the basic problem touched in #1411, #1943 and #2314.

General effect is reduced throughput, one thread maxing out a single core, several other packet processing threads running at CPU levels too high relative to the reduced traffic.

Seen this in the last years on RPi2 (quad ARM32), Odroid C2 (quad ARM 64) and Odroid HC1 (~XU4, octa ARM32), Java 8 and 9. On the quad-core machine total CPU load goes > 90%, on the octa-core total CPU stays well under 80% while the thread going wild clearly stands out.

This is not a high usage bug, in the last event total CPU was at around 20% for days, when the bug stroke.

I could identify NTCP Pumper twice and I2CP reader once. One has to restart to get back to old traffic levels. Thread dump and top output attached. The top output shows that the threads consuming most CPU at the time of the problem had considerably less total CPU run time before, it was a clear sudden change.

PID USER PR NI S %CPU %MEM P TIME COMMAND

21473 e 3 -17 R 97,1 28,8 4 529:00 I2CP Reader 191
12569 e 14 -6 R 67,1 28,8 7 135:51 I2CP Internal R
12528 e 7 -13 S 54,7 28,8 5 954:38 Tunnel GW pumpe
12527 e 7 -13 S 51,5 28,8 2 955:02 Tunnel GW pumpe
12542 e 14 -6 R 31,3 28,8 5 610:10 NTCP writer 1/1
12551 e 14 -6 R 31,3 28,8 6 16,48 UDP message rec
12540 e 11 -9 S 26,4 28,8 1 16,44 NTCP reader 2/3
12539 e 11 -9 S 25,4 28,8 6 16,46 NTCP reader 1/3
12541 e 11 -9 S 25,4 28,8 0 16,44 NTCP reader 3/3
12538 e 14 -6 S 16,0 28,8 6 18,05 NTCP Pumper
12552 e 17 -3 S 12,7 28,8 0 23,41 UDP packet push

edit zzz: dump moved to attachment

Subtickets (add)

Attachments (1)

2355-dump.txt (83.1 KB) - added by zzz 9 days ago.
dump moved from OP

Download all attachments as: .zip

Change History (11)

comment:1 Changed 3 weeks ago by jogger

  • Description modified (diff)

comment:2 Changed 3 weeks ago by zzz

  • Status changed from new to infoneeded_new

which thread is the problem?

comment:3 Changed 10 days ago by jogger

  • Status changed from infoneeded_new to new

Finally nailed it down. The thread going wild varies. Have seen it with NTCP Pumper several times, in the documentation below it is NTCP Writer running far above its long time average and Tunnel GW Pumper running high during the entire shutdown. The problem is that i2p (or the JVM) starts process thrashing and traffic drops. Basically I believe it is the combination of ARM being not the fastest and Linux CFS handing out relatively short time slices of 6 ms max, that are getting very short if many processes (threads) want to run. So the packet processing threads preempt each other multiple times before they get their work done. Solution might be to cut down on the number of threads. I have done this in the source, with more threads the bug occurred more frequently and with sharper traffic drops. Proof for this (happened after 6 days uptime).

Total system was at 80 to > 90% CPU with traffic levels (60 min average) far below the max.

Top output:

PID USER PR NI S %CPU %MEM P TIME COMMAND

30981 e 9 -11 R 83,0 31,6 4 69,28 UDP packet push
30980 e 10 -10 S 74,3 31,6 4 48,00 UDP message rec
30970 e 12 -8 R 33,4 31,6 2 31,00 NTCP Pumper
29374 e 10 -10 R 27,2 31,6 7 30,47 Tunnel GW pumpe
29375 e 10 -10 R 31,6 31,6 2 30,46 Tunnel GW pumpe
30971 e 8 -12 R 57,3 31,6 0 30,25 NTCP reader 1/3
30972 e 8 -12 R 60,1 31,6 6 30,25 NTCP reader 2/3
30973 e 8 -12 R 55,7 31,6 3 30,25 NTCP reader 3/3
30974 e 10 -10 R 58,5 31,6 5 19,43 NTCP writer 1/1
30978 e 15 -5 S 19,5 31,6 1 931:20 UDP Packet hand
14559 e 20 0 S 9,6 31,6 6 832:04 I2CP Reader 45
29367 e 20 0 S 13,0 31,6 2 558:37 SimpleTimer2 1/
29368 e 20 0 R 11,8 31,6 2 557:48 SimpleTimer2 2/
30976 e 20 0 S 8,0 31,6 4 434:24 UDPSender 2
30979 e 20 0 R 9,3 31,6 3 433:29 UDP ACK sender

Averages from stats page:
bw.receiveBps: Message receive rate (bytes/sec)
60 sec rate: Average: 996.137,75; Highest average: 1.762.161,75. There was 1 event in this period. The period ended 86 sec ago. (Average event count: 1,199; Events in peak period: 1)
5 min rate: Average: 844.344,5; Highest average: 1.356.251,0. There were 6 events in this period. The period ended 4 min ago. (Average event count: 5,999; Events in peak period: 6)
60 min rate: Average: 650.497,062; Highest average: 995.977,312. There were 72 events in this period. The period ended 29 min ago. (Average event count: 72,238; Events in peak period: 72)
Lifetime average value: 516.707,844 (10,619 events)

bw.recvRate: Low-level receive rate (bytes/sec)
60 sec rate: Average: 1.973.561,875; Highest average: 3.487.543,25. There was 1 event in this period. The period ended 86 sec ago. (Average event count: 1,199; Events in peak period: 1)
5 min rate: Average: 1.751.331,125; Highest average: 2.812.168,25. There were 6 events in this period. The period ended 4 min ago. (Average event count: 5,999; Events in peak period: 6)
60 min rate: Average: 1.391.899,375; Highest average: 2.341.244,25. There were 72 events in this period. The period ended 29 min ago. (Average event count: 72,238; Events in peak period: 72)
Lifetime average value: 1.013.717,062 (10,619 events)

bw.sendBps: Message send rate (bytes/sec)
60 sec rate: Average: 1.635.342,625; Highest average: 2.882.780,0. There was 1 event in this period. The period ended 86 sec ago. (Average event count: 1,199; Events in peak period: 1)
Graph Data - Graph Event Count
5 min rate: Average: 1.446.055,0; Highest average: 2.283.735,0. There were 6 events in this period. The period ended 4 min ago. (Average event count: 5,999; Events in peak period: 6)
60 min rate: Average: 1.174.283,75; Highest average: 1.907.248,5. There were 72 events in this period. The period ended 29 min ago. (Average event count: 72,238; Events in peak period: 72)
Lifetime average value: 1.100.247,25 (10,619 events)

bw.sendRate: Low-level send rate (bytes/sec)
60 sec rate: Average: 2.204.005,75; Highest average: 3.739.685,75. There was 1 event in this period. The period ended 86 sec ago. (Average event count: 1,199; Events in peak period: 1)
5 min rate: Average: 1.974.525,375; Highest average: 3.011.623,75. There were 6 events in this period. The period ended 4 min ago. (Average event count: 5,999; Events in peak period: 6)
60 min rate: Average: 1.597.892,5; Highest average: 2.565.552,0. There were 72 events in this period. The period ended 29 min ago. (Average event count: 72,238; Events in peak period: 72)
Lifetime average value: 1.325.021,125 (10,619 events)

I analysed further: The additional CPU was spent in user space, kernel functions at around 13%, less than at full traffic speed.

Came up with a script detailing 10 sec intervals of context switches:

Thread nonvol vol
SimpleTim? 2634 2735
Tunnel_GW 1640 3147
I2CP_Read 1235 284
NTCP_read 1195 3272
UDP_messa 1141 964
UDP_Packe 967 2894
NTCP_Pump 936 5095
NTCP_writ 914 1686
UDP_ACK_s 830 136
UDPSender 818 2976
UDP_packe 785 688
BuildHand? 660 56
UDPReceiv 450 4437
G1_Refine 398 138
GC_Thread 382 278
PRNG 326 26
BuildExec? 251 56
UDP_Estab 223 88
DH_Precal 220 95
JobQueue_ 182 484
I2CP_Inte 94 323
NTCPSendF 92 5425

100 nonvol switches per sec for many threads with the associated waiting time is just too high and slows down the router.

comment:4 Changed 9 days ago by zzz

good research! But I still don't have the info I need...
Which threads did you reduce the count for?

Changed 9 days ago by zzz

dump moved from OP

comment:5 Changed 9 days ago by zzz

  • Description modified (diff)

moved dump from OP to attachment

comment:6 Changed 6 days ago by jogger

I am running with 3 NTCP readers und build handlers, 2 Simpletimers, Job Queue runners and Tunnel GW pumpers, 1 for everything else. As said before: more threads, more frequent lockups.

Same happened again today after 4 days uptime, more extreme case, 98% total CPU, traffic drop > 50%. I2CP Reader going wild, > 80% through the entire shutdown phase, Tunnel GW Pumpers and NTCP Writer also very high. top and context switches below. Additional info: garbage collection frequency down, expected as less packets processed. Nothing unusual in jconsole, no deadlocks. Time spent in kernel calls also down 50% (see top below), also expected with less traffic.

Interesting: I created a heap dump using jmap two times. jmap caused a full GC cycle and then CPU levels dropped for some seconds. Is it possible that the JVM is shuffling memory around without resulting in GC? 90% of heap memory is G1 old gen according to jconsole but there are no G1 old gen collections, only G1 new gen. Nevertheless G1 old gen varies between 150 and > 300 MB. Where to post the 56 MB gzipped heap dump, if anybody is interested?

top - 11:10:59 up 25 days, 14:34, 0 users, load average: 16,17, 17,93, 18,52
Threads: 541 total, 16 running, 459 sleeping, 0 stopped, 1 zombie
%Cpu(s): 89,4 us, 6,9 sy, 0,0 ni, 2,0 id, 0,0 wa, 0,0 hi, 1,6 si, 0,0 st
KiB Mem : 2044844 total, 143884 free, 1526864 used, 374096 buff/cache
KiB Swap: 2096124 total, 2094076 free, 2048 used. 429216 avail Mem

PID USER PR NI S %CPU %MEM P TIME COMMAND

12107 e 13 -7 S 35,3 29,4 0 71,57 UDP packet push
12091 e 13 -7 R 29,6 29,4 7 49,21 NTCP Pumper
12106 e 11 -9 R 53,7 29,4 7 49,19 UDP message rec
12093 e 9 -11 R 41,4 29,4 6 40,45 NTCP reader 2/3
12092 e 9 -11 S 39,7 29,4 6 40,45 NTCP reader 1/3
12094 e 9 -11 R 42,7 29,4 4 40,45 NTCP reader 3/3
12080 e 9 -11 R 91,8 29,4 6 33,27 Tunnel GW pumpe
12081 e 9 -11 R 89,3 29,4 2 33,25 Tunnel GW pumpe
12095 e 9 -11 R 85,2 29,4 7 25,39 NTCP writer 1/1
12104 e 20 0 R 9,6 29,4 7 17,16 UDP Packet hand
12073 e 20 0 R 13,2 29,4 3 632:16 SimpleTimer2 1/
12074 e 20 0 R 12,1 29,4 0 632:02 SimpleTimer2 2/
12644 e 10 -10 R 80,8 29,4 5 628:01 I2CP Reader 64

Thread nonvol vol
NTCP_read 4607 1784
GC_Thread 4317 340
I2CP_Read 3149 234
SimpleTim? 3053 667
NTCP_Pump 2515 3265
UDP_packe 2379 736
Tunnel_GW 2148 697
NTCP_writ 2083 426
UDP_messa 1779 640
UDP_Packe 1495 520
BuildHand? 1094 107
UDP_ACK_s 855 213
I2CP_Inte 705 112
UDP_Estab 451 118
JobQueue_ 376 699
G1_Refine 349 154
DH_Precal 344 176

comment:7 Changed 6 days ago by zzz

I see from the dump that you're using the OpenJDK JRE. Our standing advice for many years is to use Oracle JRE for ARM - the OpenJDK was much slower and buggy, almost unusable. This was in the early days of Java 8. We still say that on our download page as we haven't ever had anybody say 'I use OpenJDK on RPi and it works great'.

Our advice may or may not still be good - but perhaps you can try the Oracle JRE and see how it works?

comment:8 Changed 6 days ago by zab

Hi Jogger,

In the thread dump you provided NTCPPumper is stuck in a very unexpected place. That rules out any 100% cpu bugs in select() calls and supports your theory of frequent context switches.

A single thread dump can only tell us so much. The way to use thread-dumps is to take, say 10 thread dumps 10 ms seconds apart (via a script ofc) then to analyze statistically where each thread is spending it's time. Can you do that for us and attach to this ticket?

Regarding reducing the number of threads, please take a look at ticket #2252. You can go down to 1 thread for every stage of the pipeline and not lose any throughput (with some caveats).

Based on your recent forum posts I'm guessing you're doing heavy seeding with BiglyBT. Can you also try spinning up an "empty" router on your ARM setup that does nothing but route traffic? I'm curious how the profile will change and if an investigation into the I2CP code is warranted.

comment:9 Changed 4 days ago by jogger

I will try to come up with a number of dumps when then lockup hits again which will be more than two days away.

Yes, I ran Oracle JDK 8 on ARM 64 a year ago and this bug hit more frequently. I tested Oracle JDK 8 and all OpenJDK 8 through 11 on ARM 32. OpenJDK 9 is definitely a big improvement over Oracle JDK 8, especially much faster. Currently preferring it because 10 and especially 11 use more memory. The warning in the logs is definitely misleading and wrong.

You can not go down to 1 thread each on an 8 core system. I will comment #2252.

The "empty" router will not work, I have only 50 Mbps outbound which allows just one router at full speed. The bug hits with Snark internal as well though.

comment:10 Changed 18 hours ago by jogger

The bug hit again. Took 10 thread dumps 10 sec apart, machine can not go faster. Also heap dump histograms before and after. During the shut down phase when most threads came down both I2CP Readers went near 100% CPU. However this is not a straight I2CP bug, it has been seen several times before when port 7654 was not in use with NTCP Pumper or Writer going wild.

I can not currently attach the files with Firefox and Safari tried: IndexError?: pop from empty list. Where to put them?

Note: See TracTickets for help on using tickets.