Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#1230 closed defect (fixed)

Restart failure on ARM

Reported by: zzz Owned by: zzz
Priority: minor Milestone: 0.9.12
Component: wrapper Version: 0.9.11
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

see #1228

With following settings in wrapper.config:

wrapper.debug=TRUE
wrapper.request_thread_dump_on_failed_jvm_exit=TRUE

See log in first comment.

Problem appears to be the 30 second setting in ConfigServiceHandler?.UpdateWrapperOrRekeyTask?, although why is profile persistence taking so long. It isn't threaded so we wait a long time for it.

Subtickets

Change History (4)

comment:1 Changed 6 years ago by zzz

wrapper log at shutdown:

xx:00:45 | WrapperManager Debug: Send a packet STOP_PENDING : 30000
xx:00:45 | WrapperManager Debug: Stopped checking for control events.
xx:00:45 | read a packet STOP_PENDING : 30000
xx:00:45 | JVM signaled a stop pending with waitHint of 30000 millis.
xx:00:45 | xxx:INFO:oejs.Server:Graceful shutdown SocketConnector ...
... more jetty shutdown ...
xx:00:51 | CRIT  [hutdown hook] net.i2p.router.Router         : Graceful shutdown progress - no more tunnels, safe to die
xx:00:51 | CRIT  [hutdown hook] net.i2p.router.Router         : Starting final shutdown(5)
xx:01:14 | Shutdown failed: Timed out waiting for signal from JVM.
xx:01:14 | Dumping JVM state.
xx:01:14 | Full thread dump Java HotSpot(TM) Client VM (25.0-b69 mixed mode):
xx:01:14 | 
xx:01:14 | "Shutdown Spinner" #185 prio=7 os_prio=0 tid=0x00c20f50 nid=0x3b5d waiting on condition [0xac9de000]
xx:01:14 |    java.lang.Thread.State: TIMED_WAITING (sleeping)
xx:01:14 | 	at java.lang.Thread.sleep(Native Method)
xx:01:14 | 	at net.i2p.router.tasks.Spinner.run(Spinner.java:20)
xx:01:14 | 
xx:01:14 | "Server /127.0.0.1:xxxx" #121 daemon prio=5 os_prio=0 tid=0x00fa0570 nid=0x3b0b waiting on condition [0xad2ce000]
xx:01:14 |    java.lang.Thread.State: WAITING (parking)
xx:01:14 | 	at sun.misc.Unsafe.park(Native Method)
xx:01:14 | 	- parking to wait for  <0xb25483a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
xx:01:14 | 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
xx:01:14 | 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
xx:01:14 | 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
xx:01:14 | 	at net.i2p.client.streaming.impl.ConnectionHandler.accept(ConnectionHandler.java:134)
xx:01:14 | 	at net.i2p.client.streaming.impl.I2PSocketManagerFull.receiveSocket(I2PSocketManagerFull.java:131)
xx:01:14 | 	at net.i2p.client.streaming.impl.I2PServerSocketFull.accept(I2PServerSocketFull.java:31)
xx:01:14 | 	at net.i2p.i2ptunnel.I2PTunnelServer.run(I2PTunnelServer.java:421)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "Timer-0" #118 daemon prio=4 os_prio=0 tid=0x0104d748 nid=0x3b08 in Object.wait() [0xad51e000]
xx:01:14 |    java.lang.Thread.State: WAITING (on object monitor)
xx:01:14 | 	at java.lang.Object.wait(Native Method)
xx:01:14 | 	- waiting on <0xb25d4838> (a java.util.TaskQueue)
xx:01:14 | 	at java.lang.Object.wait(Object.java:502)
xx:01:14 | 	at java.util.TimerThread.mainLoop(Timer.java:526)
xx:01:14 | 	- locked <0xb25d4838> (a java.util.TaskQueue)
xx:01:14 | 	at java.util.TimerThread.run(Timer.java:505)
xx:01:14 | 
xx:01:14 | "org.eclipse.jetty.util.RolloverFileOutputStream" #76 daemon prio=5 os_prio=0 tid=0x00f4c408 nid=0x3abf in Object.wait() [0xae45e000]
xx:01:14 |    java.lang.Thread.State: TIMED_WAITING (on object monitor)
xx:01:14 | 	at java.lang.Object.wait(Native Method)
xx:01:14 | 	- waiting on <0xb2390ab0> (a java.util.TaskQueue)
xx:01:14 | 	at java.util.TimerThread.mainLoop(Timer.java:552)
xx:01:14 | 	- locked <0xb2390ab0> (a java.util.TaskQueue)
xx:01:14 | 	at java.util.TimerThread.run(Timer.java:505)
xx:01:14 | 
xx:01:14 | "YK Precalc" #33 daemon prio=1 os_prio=0 tid=0x00daaf28 nid=0x3a96 waiting on condition [0xaf33d000]
xx:01:14 |    java.lang.Thread.State: TIMED_WAITING (sleeping)
xx:01:14 | 	at java.lang.Thread.sleep(Native Method)
xx:01:14 | 	at net.i2p.crypto.YKGenerator$YKPrecalcRunner.run(YKGenerator.java:213)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "RouterWatchdog" #31 daemon prio=6 os_prio=0 tid=0x00d69130 nid=0x3a94 waiting on condition [0xaf43a000]
xx:01:14 |    java.lang.Thread.State: TIMED_WAITING (sleeping)
xx:01:14 | 	at java.lang.Thread.sleep(Native Method)
xx:01:14 | 	at net.i2p.router.tasks.RouterWatchdog.run(RouterWatchdog.java:127)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "Graceful shutdown hook" #30 daemon prio=7 os_prio=0 tid=0x00d682d0 nid=0x3a93 runnable [0xaf48a000]
xx:01:14 |    java.lang.Thread.State: RUNNABLE
xx:01:14 | 	at java.util.zip.Deflater.deflateBytes(Native Method)
xx:01:14 | 	at java.util.zip.Deflater.deflate(Deflater.java:442)
xx:01:14 | 	- locked <0xb1006430> (a java.util.zip.ZStreamRef)
xx:01:14 | 	at java.util.zip.Deflater.deflate(Deflater.java:364)
xx:01:14 | 	at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:251)
xx:01:14 | 	at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
xx:01:14 | 	at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
xx:01:14 | 	- locked <0xb1002790> (a java.util.zip.GZIPOutputStream)
xx:01:14 | 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
xx:01:14 | 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
xx:01:14 | 	- locked <0xb1002778> (a java.io.BufferedOutputStream)
xx:01:14 | 	at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
xx:01:14 | 	at net.i2p.stat.RateStat.store(RateStat.java:196)
xx:01:14 | 	at net.i2p.router.peermanager.ProfilePersistenceHelper.writeProfile(ProfilePersistenceHelper.java:156)
xx:01:14 | 	at net.i2p.router.peermanager.ProfilePersistenceHelper.writeProfile(ProfilePersistenceHelper.java:87)
xx:01:14 | 	at net.i2p.router.peermanager.PeerManager.storeProfile(PeerManager.java:154)
xx:01:14 | 	at net.i2p.router.peermanager.PeerManager.storeProfiles(PeerManager.java:133)
xx:01:14 | 	at net.i2p.router.peermanager.PeerManagerFacadeImpl.shutdown(PeerManagerFacadeImpl.java:52)
xx:01:14 | 	- locked <0xb1e0be00> (a net.i2p.router.peermanager.PeerManagerFacadeImpl)
xx:01:14 | 	at net.i2p.router.Router.shutdown2(Router.java:840)
xx:01:14 | 	- locked <0xb1da37a8> (a net.i2p.router.Router)
xx:01:14 | 	at net.i2p.router.Router.shutdown(Router.java:779)
xx:01:14 | 	- locked <0xb1da37a8> (a net.i2p.router.Router)
xx:01:14 | 	at net.i2p.router.tasks.GracefulShutdown.run(GracefulShutdown.java:41)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "Tunnel GW pumper 2/2" #28 daemon prio=5 os_prio=0 tid=0x00d50e58 nid=0x3a92 in Object.wait() [0xaf4da000]
xx:01:14 |    java.lang.Thread.State: WAITING (on object monitor)
xx:01:14 | 	at java.lang.Object.wait(Native Method)
xx:01:14 | 	- waiting on <0xb1efe938> (a java.util.LinkedHashSet)
xx:01:14 | 	at java.lang.Object.wait(Object.java:502)
xx:01:14 | 	at net.i2p.router.tunnel.TunnelGatewayPumper.run(TunnelGatewayPumper.java:91)
xx:01:14 | 	- locked <0xb1efe938> (a java.util.LinkedHashSet)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "Tunnel GW pumper 1/2" #27 daemon prio=5 os_prio=0 tid=0x00d53420 nid=0x3a91 in Object.wait() [0xaf52a000]
xx:01:14 |    java.lang.Thread.State: WAITING (on object monitor)
xx:01:14 | 	at java.lang.Object.wait(Native Method)
xx:01:14 | 	- waiting on <0xb1efe938> (a java.util.LinkedHashSet)
xx:01:14 | 	at java.lang.Object.wait(Object.java:502)
xx:01:14 | 	at net.i2p.router.tunnel.TunnelGatewayPumper.run(TunnelGatewayPumper.java:91)
xx:01:14 | 	- locked <0xb1efe938> (a java.util.LinkedHashSet)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "PRNG" #26 daemon prio=2 os_prio=0 tid=0xafd4e4c8 nid=0x3a90 waiting on condition [0xafa5e000]
xx:01:14 |    java.lang.Thread.State: WAITING (parking)
xx:01:14 | 	at sun.misc.Unsafe.park(Native Method)
xx:01:14 | 	- parking to wait for  <0xb1e05a68> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
xx:01:14 | 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
xx:01:14 | 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
xx:01:14 | 	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
xx:01:14 | 	at gnu.crypto.prng.AsyncFortunaStandalone.run(AsyncFortunaStandalone.java:134)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "StatLogWriter" #20 daemon prio=5 os_prio=0 tid=0x00cf6a78 nid=0x3a8b in Object.wait() [0xaf77e000]
xx:01:14 |    java.lang.Thread.State: TIMED_WAITING (on object monitor)
xx:01:14 | 	at java.lang.Object.wait(Native Method)
xx:01:14 | 	- waiting on <0xb1dc0498> (a [Lnet.i2p.stat.BufferedStatLog$StatEvent;)
xx:01:14 | 	at net.i2p.stat.BufferedStatLog$StatLogWriter.run(BufferedStatLog.java:136)
xx:01:14 | 	- locked <0xb1dc0498> (a [Lnet.i2p.stat.BufferedStatLog$StatEvent;)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "LogWriter" #16 daemon prio=5 os_prio=0 tid=0x00b23850 nid=0x3a88 in Object.wait() [0xaf86e000]
xx:01:14 |    java.lang.Thread.State: TIMED_WAITING (on object monitor)
xx:01:14 | 	at java.lang.Object.wait(Native Method)
xx:01:14 | 	- waiting on <0xb1dc1a00> (a net.i2p.util.LogWriter)
xx:01:14 | 	at net.i2p.util.LogWriter.flushRecords(LogWriter.java:112)
xx:01:14 | 	- locked <0xb1dc1a00> (a net.i2p.util.LogWriter)
xx:01:14 | 	at net.i2p.util.LogWriter.flushRecords(LogWriter.java:69)
xx:01:14 | 	at net.i2p.util.LogWriter.run(LogWriter.java:57)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 	at net.i2p.util.I2PThread.run(I2PThread.java:84)
xx:01:14 | 
xx:01:14 | "DestroyJavaVM" #12 prio=5 os_prio=0 tid=0x00a81070 nid=0x3a78 waiting on condition [0x00000000]
xx:01:14 |    java.lang.Thread.State: RUNNABLE
xx:01:14 | 
xx:01:14 | "Wrapper-Connection" #11 daemon prio=10 os_prio=0 tid=0x00b16040 nid=0x3a84 runnable [0xafaae000]
xx:01:14 |    java.lang.Thread.State: RUNNABLE
xx:01:14 | 	at java.net.SocketInputStream.socketRead0(Native Method)
xx:01:14 | 	at java.net.SocketInputStream.read(SocketInputStream.java:150)
xx:01:14 | Pause reading child process output to share cycles.
xx:01:14 | 	at java.net.SocketInputStream.read(SocketInputStream.java:121)
xx:01:14 | 	at java.net.SocketInputStream.read(SocketInputStream.java:203)
xx:01:14 | 	at java.io.DataInputStream.readByte(DataInputStream.java:265)
xx:01:14 | 	at org.tanukisoftware.wrapper.WrapperManager.handleBackend(WrapperManager.java:5254)
xx:01:14 | 	at org.tanukisoftware.wrapper.WrapperManager.run(WrapperManager.java:5631)
xx:01:14 | 	at java.lang.Thread.run(Thread.java:744)
xx:01:14 | 
xx:01:14 | "Wrapper-Control-Event-Monitor" #9 daemon prio=5 os_prio=0 tid=0x00b82760 nid=0x3a82 waiting on condition [0xafc6c000]
xx:01:14 |    java.lang.Thread.State: TIMED_WAITING (sleeping)
xx:01:14 | 	at java.lang.Thread.sleep(Native Method)
xx:01:14 | 	at org.tanukisoftware.wrapper.WrapperManager$3.run(WrapperManager.java:1034)
xx:01:14 | 
xx:01:14 | "Service Thread" #6 daemon prio=9 os_prio=0 tid=0x00b037b8 nid=0x3a7e runnable [0x00000000]
xx:01:14 |    java.lang.Thread.State: RUNNABLE
xx:01:14 | 
xx:01:14 | "C1 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00afe6a8 nid=0x3a7d waiting on condition [0x00000000]
xx:01:14 |    java.lang.Thread.State: RUNNABLE
xx:01:14 | 
xx:01:14 | "Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00afd210 nid=0x3a7c waiting on condition [0x00000000]
xx:01:14 |    java.lang.Thread.State: RUNNABLE
xx:01:14 | 
xx:01:14 | "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00adb6c0 nid=0x3a7b in Object.wait() [0xb0251000]
xx:01:14 |    java.lang.Thread.State: WAITING (on object monitor)
xx:01:14 | 	at java.lang.Object.wait(Native Method)
xx:01:14 | 	- waiting on <0xb1cf4c18> (a java.lang.ref.ReferenceQueue$Lock)
xx:01:14 | 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:142)
xx:01:14 | 	- locked <0xb1cf4c18> (a java.lang.ref.ReferenceQueue$Lock)
xx:01:14 | 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:158)
xx:01:14 | 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
xx:01:14 | 
xx:01:14 | "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00ada2c0 nid=0x3a7a in Object.wait() [0xb02a1000]
xx:01:14 |    java.lang.Thread.State: WAITING (on object monitor)
xx:01:14 | 	at java.lang.Object.wait(Native Method)
xx:01:14 | 	- waiting on <0xb1cf4db8> (a java.lang.ref.Reference$Lock)
xx:01:14 | 	at java.lang.Object.wait(Object.java:502)
xx:01:14 | 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
xx:01:14 | 	- locked <0xb1cf4db8> (a java.lang.ref.Reference$Lock)
xx:01:14 | 
xx:01:14 | "VM Thread" os_prio=0 tid=0x00ad64a0 nid=0x3a79 runnable 
xx:01:14 | 
xx:01:14 | "VM Periodic Task Thread" os_prio=0 tid=0x00a89cb0 nid=0x3a7f waiting on condition 
xx:01:14 | 
xx:01:14 | JNI global references: 165
xx:01:14 | 
xx:01:14 | Heap
xx:01:14 |  def new generation   total 13568K, used 7201K [0xb0a00000, 0xb18a0000, 0xb1ca0000)
xx:01:14 |   eden space 12160K,  59% used [0xb0a00000, 0xb1103328, 0xb15e0000)
xx:01:14 |   from space 1408K,   1% used [0xb1740000, 0xb17454b0, 0xb18a0000)
xx:01:14 |   to   space 1408K,   0% used [0xb15e0000, 0xb15e0000, 0xb1740000)
xx:01:14 |  tenured generation   total 29624K, used 21029K [0xb1ca0000, 0xb398e000, 0xb4200000)
xx:01:14 |    the space 29624K,  70% used [0xb1ca0000, 0xb31295b0, 0xb3129600, 0xb398e000)
xx:01:14 |  Metaspace       used 13944K, capacity 14038K, committed 14104K, reserved 14640K
xx:01:14 | 
xx:01:18 | JVM did not exit on request, termination requested.
xx:01:19 | Signal trapped.  Details:
xx:01:19 |   signal number=17 (SIGCHLD), source="unknown"
xx:01:19 | Received SIGCHLD, checking JVM process status.
xx:01:19 | JVM received a signal SIGKILL (9).
xx:01:19 | socket read no code (closed?).
xx:01:19 | Closing backend socket.
xx:01:19 | JVM process is gone.
xx:01:19 | JVM process exited with a code of 1, setting the wrapper exit code to 1.
xx:01:19 | JVM exited after being requested to terminate.
xx:01:19 | <-- Wrapper Stopped

comment:2 Changed 6 years ago by zzz

Milestone: 0.9.130.9.12
Owner: set to zzz
Status: newaccepted

Extended the timeout and the actual shutdown on an RPi with ~2K known routers took about 50 seconds and restarted fine. Will continue testing.

comment:3 Changed 6 years ago by zzz

Resolution: fixed
Status: acceptedclosed

comment:4 Changed 6 years ago by zzz

Fix details:
Default timeout extended from 30s to 4m on ARM and 2m on other platforms.

Also in the fix, the wrapper.config setting wrapper.jvm_exit.timeout=nnn (seconds) will override the timeout if it is longer than the default.

Note: See TracTickets for help on using tickets.