Opened 2 months ago

Closed 6 weeks ago

#2366 closed defect (fixed)

Java-level deadlock in 0.9.37

Reported by: slumlord Owned by: zzz
Priority: blocker Milestone: 0.9.38
Component: router/netdb Version: 0.9.37
Keywords: deadlock, buildexecutor, iterativesearchjob Cc: zzz
Parent Tickets:

Description

2018/12/19 08:14:27 | Found one Java-level deadlock:
2018/12/19 08:14:27 | =============================
2018/12/19 08:14:27 | "BuildExecutor":
2018/12/19 08:14:27 |   waiting to lock monitor 0x00000008a4268148 (object 0x000000080e177d10, a net.i2p.router.networkdb.kademlia.IterativeSearchJob),
2018/12/19 08:14:27 |   which is held by "JobQueue 2/5"
2018/12/19 08:14:27 | "JobQueue 2/5":
2018/12/19 08:14:27 |   waiting to lock monitor 0x0000000891442188 (object 0x000000080e178380, a net.i2p.router.networkdb.kademlia.IterativeSearchJob),
2018/12/19 08:14:27 |   which is held by "JobQueue 4/5"
2018/12/19 08:14:27 | "JobQueue 4/5":
2018/12/19 08:14:27 |   waiting to lock monitor 0x00000008a4268148 (object 0x000000080e177d10, a net.i2p.router.networkdb.kademlia.IterativeSearchJob),
2018/12/19 08:14:27 |   which is held by "JobQueue 2/5"
2018/12/19 08:14:27 | 
2018/12/19 08:14:27 | Java stack information for the threads listed above:
2018/12/19 08:14:27 | ===================================================
2018/12/19 08:14:27 | "BuildExecutor":
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodSearchJob.addDeferred(FloodSearchJob.java:75)
2018/12/19 08:14:27 | 	- waiting to lock <0x000000080e177d10> (a net.i2p.router.networkdb.kademlia.IterativeSearchJob)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.search(FloodfillNetworkDatabaseFacade.java:407)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.search(FloodfillNetworkDatabaseFacade.java:367)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.lookupBeforeDropping(FloodfillNetworkDatabaseFacade.java:545)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.KademliaNetworkDatabaseFacade.fail(KademliaNetworkDatabaseFacade.java:1136)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.KademliaNetworkDatabaseFacade.lookupRouterInfoLocally(KademliaNetworkDatabaseFacade.java:649)
2018/12/19 08:14:27 | 	at net.i2p.router.transport.TransportImpl.wasUnreachable(TransportImpl.java:929)
2018/12/19 08:14:27 | 	at net.i2p.router.transport.TransportManager.wasUnreachable(TransportManager.java:565)
2018/12/19 08:14:27 | 	at net.i2p.router.transport.CommSystemFacadeImpl.wasUnreachable(CommSystemFacadeImpl.java:179)
2018/12/19 08:14:27 | 	at net.i2p.router.peermanager.ProfileOrganizer.selectPeersLocallyUnreachable(ProfileOrganizer.java:710)
2018/12/19 08:14:27 | 	at net.i2p.router.tunnel.pool.TunnelPeerSelector.getExclude(TunnelPeerSelector.java:212)
2018/12/19 08:14:27 | 	at net.i2p.router.tunnel.pool.ClientPeerSelector.selectPeers(ClientPeerSelector.java:62)
2018/12/19 08:14:27 | 	at net.i2p.router.tunnel.pool.TunnelPool.configureNewTunnel(TunnelPool.java:1155)
2018/12/19 08:14:27 | 	at net.i2p.router.tunnel.pool.TunnelPool.configureNewTunnel(TunnelPool.java:1118)
2018/12/19 08:14:27 | 	at net.i2p.router.tunnel.pool.BuildExecutor.run2(BuildExecutor.java:404)
2018/12/19 08:14:27 | 	at net.i2p.router.tunnel.pool.BuildExecutor.run(BuildExecutor.java:309)
2018/12/19 08:14:27 | 	at java.lang.Thread.run(Thread.java:748)
2018/12/19 08:14:27 | 	at net.i2p.util.I2PThread.run(I2PThread.java:103)
2018/12/19 08:14:27 | "JobQueue 2/5":
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodSearchJob.addDeferred(FloodSearchJob.java:75)
2018/12/19 08:14:27 | 	- waiting to lock <0x000000080e178380> (a net.i2p.router.networkdb.kademlia.IterativeSearchJob)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.search(FloodfillNetworkDatabaseFacade.java:407)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.search(FloodfillNetworkDatabaseFacade.java:367)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.lookupBeforeDropping(FloodfillNetworkDatabaseFacade.java:545)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.KademliaNetworkDatabaseFacade.fail(KademliaNetworkDatabaseFacade.java:1136)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.KademliaNetworkDatabaseFacade.lookupRouterInfoLocally(KademliaNetworkDatabaseFacade.java:649)
2018/12/19 08:14:27 | 	at net.i2p.router.util.MaskedIPSet.<init>(MaskedIPSet.java:39)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.IterativeSearchJob.retry(IterativeSearchJob.java:293)
2018/12/19 08:14:27 | 	- locked <0x000000080e177d10> (a net.i2p.router.networkdb.kademlia.IterativeSearchJob)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.IterativeSearchJob.failed(IterativeSearchJob.java:503)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.IterativeTimeoutJob.runJob(IterativeTimeoutJob.java:26)
2018/12/19 08:14:27 | 	at net.i2p.router.JobQueueRunner.runCurrentJob(JobQueueRunner.java:135)
2018/12/19 08:14:27 | 	at net.i2p.router.JobQueueRunner.run(JobQueueRunner.java:78)
2018/12/19 08:14:27 | "JobQueue 4/5":
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodSearchJob.addDeferred(FloodSearchJob.java:75)
2018/12/19 08:14:27 | 	- waiting to lock <0x000000080e177d10> (a net.i2p.router.networkdb.kademlia.IterativeSearchJob)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.search(FloodfillNetworkDatabaseFacade.java:407)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.search(FloodfillNetworkDatabaseFacade.java:367)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.FloodfillNetworkDatabaseFacade.lookupBeforeDropping(FloodfillNetworkDatabaseFacade.java:545)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.KademliaNetworkDatabaseFacade.fail(KademliaNetworkDatabaseFacade.java:1136)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.KademliaNetworkDatabaseFacade.lookupRouterInfoLocally(KademliaNetworkDatabaseFacade.java:649)
2018/12/19 08:14:27 | 	at net.i2p.router.util.MaskedIPSet.<init>(MaskedIPSet.java:39)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.IterativeSearchJob.retry(IterativeSearchJob.java:293)
2018/12/19 08:14:27 | 	- locked <0x000000080e178380> (a net.i2p.router.networkdb.kademlia.IterativeSearchJob)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.IterativeSearchJob.failed(IterativeSearchJob.java:503)
2018/12/19 08:14:27 | 	at net.i2p.router.networkdb.kademlia.IterativeTimeoutJob.runJob(IterativeTimeoutJob.java:26)
2018/12/19 08:14:27 | 	at net.i2p.router.JobQueueRunner.runCurrentJob(JobQueueRunner.java:135)
2018/12/19 08:14:27 | 	at net.i2p.router.JobQueueRunner.run(JobQueueRunner.java:78)
2018/12/19 08:14:27 | 
2018/12/19 08:14:27 | Found 1 deadlock.
2018/12/19 08:14:27 | 
2018/12/19 08:14:27 | Heap
2018/12/19 08:14:27 |  PSYoungGen      total 19456K, used 4788K [0x0000000860e00000, 0x0000000862200000, 0x000000088b900000)
2018/12/19 08:14:27 |   eden space 18432K, 22% used [0x0000000860e00000,0x0000000861205030,0x0000000862000000)
2018/12/19 08:14:27 |   from space 1024K, 65% used [0x0000000862100000,0x00000008621a8000,0x0000000862200000)
2018/12/19 08:14:27 |   to   space 1024K, 0% used [0x0000000862000000,0x0000000862000000,0x0000000862100000)
2018/12/19 08:14:27 |  ParOldGen       total 63488K, used 62669K [0x000000080b900000, 0x000000080f700000, 0x0000000860e00000)
2018/12/19 08:14:27 |   object space 63488K, 98% used [0x000000080b900000,0x000000080f633520,0x000000080f700000)
2018/12/19 08:14:27 |  PSPermGen       total 29696K, used 29361K [0x0000000806700000, 0x0000000808400000, 0x000000080b900000)
2018/12/19 08:14:27 |   object space 29696K, 98% used [0x0000000806700000,0x00000008083ac680,0x0000000808400000)
2018/12/19 08:14:27 | 
2018/12/19 08:14:29 | CRIT  [uterWatchdog] 2p.router.tasks.RouterWatchdog: Router appears hung, or there is severe network congestion.  Watchdog starts barking!
2018/12/19 08:14:29 | CRIT  [uterWatchdog] et.i2p.router.tasks.ThreadDump: Threads dumped to /home/i2puser/.i2p/wrapper.log

Subtickets

Change History (2)

comment:1 Changed 2 months ago by zzz

  • Status changed from new to accepted

One in a billion, never seen before.
Two ISJ's deadlock on each other, then the tunnel builder gets caught in it and we're done.

Perhaps unexpectedly, MaskedIPSet constructor calls lookupRouterInfoLocally() which found an expired RI and thus calls fail() which starts a remote lookup anyway and finds the pending lookup. This is going to be tricky to figure out the best way to fix and how not to break anything else.

comment:2 Changed 6 weeks ago by zzz

  • Resolution set to fixed
  • Status changed from accepted to closed

In 18316867fddaf82e7529c44fabcf7a3722d5aa71 0.9.37-14

Note: See TracTickets for help on using tickets.