Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#544 closed defect (duplicate)

"Restart imminent", but not restarting

Reported by: DISABLED Owned by: sponge
Priority: minor Milestone:
Component: apps/plugins Version: 0.8.10
Keywords: restart, router not restarting Cc: killyourtv@…
Parent Tickets: Sensitive: no

Description

clicked restart in the sidepanel after having downloaded a router update. after about 19hrs of having the message "restart imminent" with the router still running

Subtickets

Attachments (7)

wrapper.log (135.7 KB) - added by DISABLED 8 years ago.
wrapper.log.1.1 (216.0 KB) - added by DISABLED 8 years ago.
wrapper.log.1.tar.gz (40.8 KB) - added by DISABLED 8 years ago.
wrapper.log.2.tar.gz (45.0 KB) - added by DISABLED 8 years ago.
wrapper.log_and_wrapper.log.1.tar.gz (23.9 KB) - added by sponge 8 years ago.
fullthreadlist (12.6 KB) - added by sponge 8 years ago.
threadsthatmatter (783 bytes) - added by sponge 8 years ago.

Download all attachments as: .zip

Change History (28)

Changed 8 years ago by DISABLED

Attachment: wrapper.log added

Changed 8 years ago by DISABLED

Attachment: wrapper.log.1.1 added

comment:1 Changed 8 years ago by DISABLED

I2P version: 0.8.10-5
Java version: Sun Microsystems Inc. 1.6.0_22 (OpenJDK Runtime Environment 1.6.0_22-b22)
Wrapper version: 3.5.12
Platform: Linux amd64 2.6.37.6-0.7-desktop
Processor: AMD Turion X2/Athlon X2/Sempron (Lion/Sable?, 65 nm) (athlon64)
Jbigi: Locally optimized native BigInteger? library loaded from file
Encoding: UTF-8

Critical Logs

5/11/11 18:27:35 CRIT [hutdown hook] net.i2p.router.Router : Graceful shutdown progress - no more tunnels, safe to die
3/11/11 23:45:48 CRIT [signedUpdate] t.i2p.router.web.UpdateHandler?: Update was downloaded, will be installed at next restart

comment:2 Changed 8 years ago by zzz

Component: router/generalapps/plugins
Owner: set to sponge
Status: newassigned

The shutdown thread is stuck in seedless at net.i2p.seedless.console.SeedlessSeedFinderContextListener?.contextDestroyed(SeedlessSeedFinderContextListener?.java:148)

See line 572 in the first attachment. Perhaps the info in the 2nd attachment is a clue, I don't know. But I don't think a potentially infinite wait() inside a listener callback is a good idea.

Assigning to sponge.

Now we - perhaps - should not wait forever for a plugin shutdown request to finish. If you think it's a good idea to implement that change, please reassign the ticket back to me after you resolve the immediate problem in seedless.

comment:3 Changed 8 years ago by sponge

This happens because the eepsite isn't shut down. Since the database is shut down, the war has no way to get any sort of 'status' feedback from the database. If the router does a shutdown on the eepsite, the code will exit. It does a constant retry because neodatis may be ran outside of the same JVM as the eepsite… I.E. Jetty might be running stand-alone, and the router can be restarted without it.

Shut down the Jetty instance, and it will work as expected. The plugin IS shut down, but the eepsite is NOT! That is what is causing the loop.

This all goes back to the real need to be able to actually shut down stuff like we do for the plugins.

I can confirm that it is not the plugin at fault. net.i2p.seedless.reuse.scrape.Scraper is used by the cache too, but that has been shut down. If you look carefully, you'll see that it's coming from net.i2p.seedlessServer.Seedless, which is ran from the eepsite. You'll see the eepannouncer thread too if that were to be installed. If it was the plugin, the result would be different, you would see the seedless cache class instead, and I don't see it.

comment:4 Changed 8 years ago by DISABLED

no stand-alone jetty, unless it comes with the default installation.
and eepsite announcer is not installed.

comment:5 Changed 8 years ago by zzz

Just for my reference, here's the whole traceback:

569	2011/11/06 15:29:49 | "Graceful shutdown hook" daemon prio=10 tid=0x000000000077f000 nid=0x8fe sleeping[0x00007f596d7a8000]
570	2011/11/06 15:29:49 |    java.lang.Thread.State: TIMED_WAITING (sleeping)
571	2011/11/06 15:29:49 |   at java.lang.Thread.sleep(Native Method)
572	2011/11/06 15:29:49 |   at net.i2p.seedless.console.SeedlessSeedFinderContextListener.contextDestroyed(SeedlessSeedFinderContextListener.java:148)
573	2011/11/06 15:29:49 |   at org.mortbay.jetty.servlet.WebApplicationContext.doStop(WebApplicationContext.java:558)
574	2011/11/06 15:29:49 |   at org.mortbay.util.Container.stop(Container.java:163)
575	2011/11/06 15:29:49 |   - locked <0x00000000e0ac6540> (a org.mortbay.jetty.servlet.WebApplicationContext)
576	2011/11/06 15:29:49 |   at org.mortbay.http.HttpContext.stop(HttpContext.java:1733)
577	2011/11/06 15:29:49 |   at net.i2p.router.web.WebAppStarter.stopWebApp(WebAppStarter.java:121)
578	2011/11/06 15:29:49 |   at net.i2p.router.web.PluginStarter.stopPlugin(PluginStarter.java:245)
579	2011/11/06 15:29:49 |   at net.i2p.router.web.PluginStopper.stopPlugins(PluginStopper.java:33)
580	2011/11/06 15:29:49 |   at net.i2p.router.web.PluginStopper.run(PluginStopper.java:20)
581	2011/11/06 15:29:49 |   at net.i2p.router.Router.shutdown2(Router.java:1040)
582	2011/11/06 15:29:49 |   at net.i2p.router.Router.shutdown(Router.java:1016)
583	2011/11/06 15:29:49 |   at net.i2p.router.Router$GracefulShutdown.run(Router.java:1214)
584	2011/11/06 15:29:49 |   at java.lang.Thread.run(Thread.java:679)
585	2011/11/06 15:29:49 |   at net.i2p.util.I2PThread.run(I2PThread.java:85)

So this is the plugin shutdown hook.

Shut down the Jetty instance, and it will work as expected. The plugin IS shut down, but the eepsite is NOT! That is what is causing the loop.

This all goes back to the real need to be able to actually shut down stuff like we do for the plugins.

There _is_ also a Jetty shutdown hook.

Shutdown hooks do not run in any particular order and you cannot impose any order. In any case, the eepsite could have been shut down manually. Or, as you said, the eepsite could be in a different JVM. Perhaps you are getting called twice? Once from Jetty shutdown and once from plugin shutdown?

So you need to return from contextDestroyed() promptly no matter what the state of the eepsite Jetty or the console Jetty. If you need something that will take a while, spawn a thread… and of course the JVM may then get shut down out from under you if you take too long.

No matter what the state of anything else, you can't hang in that method for long.

comment:6 Changed 8 years ago by sponge

Not getting called twice, and what I meant by a seperate JVM is that someone can use jetty outside of i2p's JVM _manually_. This is the case and noted by the reporter. That aside…

The context destroyer should be able to cause it to exit almost immediately, as it sends signals up the thread chains to shut down. The order does not matter (or should not) at all. The Server and Announcer code tries 10 times to get status from the database, and fails (as it shows in the logs). Upon fail it does check if we are asked to shut down, and will do so. Apparently it did not get this signal.

@ reporter… can you reproduce this consistently? If so, I will double check the seedless core code that is executed, and if there is a problem there, fix it, but I have never seen this happen.

comment:7 Changed 8 years ago by sponge

From the source code, with comments. This is going to be a long technical post.

            while(spin) {
..
                if(announcer.getMeta().contains("torrent")) {
..
Line 330:
                    try {
                        db = new DB(this.getClass().getName());
..
which is caught 
                    } catch(Exception ex) {
                        ex.printStackTrace();
                    } finally {
                        odb = null;
                        try {
                            if(db != null && !db.isClosed()) {
                                db.close();
                            }
                        } catch(Exception ex) {
                        }
                        db = null;
                    }
                }

                // Sleep for 1 minute and loop.
                try {
                    Thread.sleep(60 * 1000); // Try again in 1 minute.
                } catch(InterruptedException ex) {
                }

            }

Here we go back to the top, while(spin) after a one minute delay. Perhaps I need to do a snooze here instead like in other places, but that should not matter much.

Since spin is still true and

    /**
     * Kill it.
     */
    public void die() {
        spin = false;
    }

did not get executed, it won't exit.
This code is hit from the context destroy code.

    public void contextInitialized(ServletContextEvent sce) {
..
                    pop = new SeedlessServerAnnouncer(tunnel.outToI2P, net.i2p.seedlessServer.Version.VERSION);
..
                    t = new Thread(pop, "Seedless.ServerAnnouncer");
                    t.setDaemon(true);
                    t.start();
..
    }

and:

    public void contextDestroyed(ServletContextEvent sce) {
        if(t != null) {
..
Here is where it is told to go away.
            pop.die();
            try {
                Thread.sleep(2);
            } catch(InterruptedException ex) {
            }
..
Here we send interrupts to break out of any sleep or IO operation every 2 seconds
            while(t.isAlive()) {
                t.interrupt();
                try {
                    Thread.sleep(2);
                } catch(InterruptedException ex) {
                }
            }
        }
        System.out.println("Seedless server context destroyed");
    }

Since the contextDestroyed method is never called, it's not shutting down on the eepsite.

The same code is used in the console war as well as the same die method and while loop.
In this case the die method was called, but the thread is still alive.
Here it is:

   public void contextDestroyed(ServletContextEvent sce) {
        if(t != null) {
            ssf.die();
            try {
                Thread.sleep(2);
            } catch(InterruptedException ex) {
            }
            while(t.isAlive()) {
                t.interrupt();
                try {
..
Line 148
                    Thread.sleep(2);
                } catch(InterruptedException ex) {
                }
            }
        }
        System.out.println("SeedlessSeedFinder server context destroyed");
    }
}

In the SeedlessSeedFinder? thread, we are in the exit loop.

2011/11/06 15:29:49 | "SeedlessSeedFinder" daemon prio=10 tid=0x00007f596402b000 nid=0xa4f waiting on condition [0x00007f5968136000]
2011/11/06 15:29:49 |    java.lang.Thread.State: TIMED_WAITING (sleeping)
2011/11/06 15:29:49 |   at java.lang.Thread.sleep(Native Method)
2011/11/06 15:29:49 |   at net.i2p.seedless.console.SeedlessSeedFinder.run(SeedlessSeedFinder.java:456)
2011/11/06 15:29:49 |   at java.lang.Thread.run(Thread.java:679)

Which is doing this to shut down in a finally

            while(!oK) {
                oK = true;
                if(seedlessCacheThread != null) {
                    if(seedlessCacheThread.isAlive()) {
                        oK = false;
                        seedlessCache.die();
                        seedlessCacheThread.interrupt();
                    }
                }
                if(scraperThread != null) {
                    if(!scraperThread.isAlive()) {
                        oK = false;
                        scraper.die();
                        scraperThread.interrupt();
                    }
                }
                if(tunnelMonitorThread != null) {
                    if(!tunnelMonitorThread.isAlive()) {
                        oK = false;
                        tunnelMonitor.die();
                        tunnelMonitorThread.interrupt();
                    }
                }
                if(!oK) {
                    try {
..
Line 456
                        Thread.sleep(1000); // Try again in 1 second.
                    } catch(InterruptedException ex) {
                    }

                }
            }
            System.out.println("SeedlessSeedFinder Exited.");

OK… so which one(s) are still running?
seedlessCache?
scraperThread?
tunnelMonitorThread?
I don't see any of them in the thread dump. Perhaps they are at the top part which got cut off.
Here's all the threads:

I2CP Internal Reader 1
SeedlessSeedFinder
StreamingTimer 4/4
StreamingTimer 3/4
StreamingTimer 2/4
StreamingTimer 1/4
SimpleTimer2 4/4
SimpleTimer2 3/4
SimpleTimer2 2/4
SimpleTimer2 1/4
BOB application bridge
SessionScavenger
SessionScavenger
StreamConsumer
StreamConsumer
process reaper
CommandThread
Timer-0
NewsFetcher
Acceptor ServerSocket[addr=/127.0.0.1,port=0,localport=7657]
Acceptor ServerSocket[addr=/0:0:0:0:0:0:0:1,port=0,localport=7657]
SessionScavenger
ClientListener:7654
SessionScavenger
UDP packet pusher
Addressbook
UDP message receiver 5/5
UDP message receiver 4/5
UDP message receiver 3/5
UDP message receiver 2/5
UDP message receiver 1/5
SessionScavenger
UDP ACK sender
UDP Packet handler 5/5
UDP Packet handler 4/5
UDP Packet handler 3/5
UDP Packet handler 2/5
UDP Packet handler 1/5
UDP Establisher
UDPReceiver.1
UDPSender
SessionScavenger
SessionScavenger
Snark DirMonitor
SessionScavenger
UPnP-Disposer
DBWriter
JobQueue 5/5
JobQueue 4/5
JobQueue 3/5
JobQueue 2/5
YK Precalc
DH Precalc
JobQueue 1/1
RouterWatchdog
Graceful shutdown hook
Tunnel GW pumper 4/4
Tunnel GW pumper 3/4
Tunnel GW pumper 2/4
Tunnel GW pumper 1/4
BuildHandler 1/1
PRNG
SimpleTimerExecutor 4/4
SimpleTimerExecutor 3/4
SimpleTimerExecutor 2/4
SimpleTimerExecutor 1/4
SimpleTimer
BuildExecutor
BWRefiller
Job Queue Pumper
LogWriter
DestroyJavaVM
Wrapper-Connection
Wrapper-Control-Event-Monitor
Low Memory Detector
C2 CompilerThread1
C2 CompilerThread0
Signal Dispatcher
Finalizer
Reference Handler
VM Thread
GC task thread#0 (ParallelGC)
GC task thread#1 (ParallelGC)
VM Periodic Task Thread

Are they in wrapper.log.1?

Changed 8 years ago by DISABLED

Attachment: wrapper.log.1.tar.gz added

Changed 8 years ago by DISABLED

Attachment: wrapper.log.2.tar.gz added

comment:8 Changed 8 years ago by sponge

Extra notes:
The seedlessCache:

            System.out.println("SeedlessCache Exited.");

The scraperThread:
Does not print on exit. I'll add that for debugging if used within the seedfinder.

The tunnelMonitorThread:
Does not print on exit. I'll add that for debugging.

If the reporter can reproduce the problem and all three messages appear, then we know it is the OpenJDK JVM, and not my code. Stay tuned here for an update to the core that will include the new exit messages.

comment:9 Changed 8 years ago by zzz

It's too complicated for me to follow. All I know is you have a (potentially) infinite loop in contextDestroyed().

You should, imho, wait at most a second or two in contextDestroyed(). After that, give up and return.

Of course, you want to chase down the root cause. In the meantime however, and in case of other similar problems, I suggest you remove the infinite loops.

Also, you should look at Thread.join(maxWait) which will return faster than doing a wait(maxWait) after an alive check in a loop.

I also note that the OP posted additional logs for you.

Also, in response to your IRC comment, I very much doubt it has anything to do with OpenJDK vs. Oracle JVMs.

comment:10 Changed 8 years ago by DISABLED

This is not caused by someone using jetty outside of i2p's JVM _manually_.

comment:11 Changed 8 years ago by sponge

Thanks, was able to paste together the entire dump.

Changed 8 years ago by sponge

Changed 8 years ago by sponge

Attachment: fullthreadlist added

Changed 8 years ago by sponge

Attachment: threadsthatmatter added

comment:12 Changed 8 years ago by sponge

monitor_scrape is snoozing, waiting to be woken up @ line 220.

    private void snooze() {
        if(testing) {
            System.out.println("Scrape snooze " + type);
        }

        isAlive.set(false);
        synchronized(isAlive) {
            while(!isAlive.get()) {
                try {
Line 220
                    isAlive.wait();
                } catch(InterruptedException ex) {
                    if(!spin) {
                        return;
                    }
                }
            }
        }

Seedless.Scrape-* are all waiting for http @ java.net.HttpURLConnection.getResponseCode
Seedless.Scraper-* all waiting on the above to complete or are snoozing
cache_scrape-* all snoozing
Seedless.ServerAnnouncer? waiting for signal to die
seedless server scraper snoozing

i2p-bote:
SeedlsAnounc?
SeedlsScpSvr?
SeedlsScpPrs?
SeedlsReqPrs?

Basically the event is not getting detected. the interrupt is not interrupting the sleeps or blocked I/O. The code IS correct.

@reporter please tell us here if you can reproduce the problem. If you can, I am suspecting something is going wrong with the installed java, and recommend that you try to upgrade or downgrade java and see if it still happens.

comment:13 Changed 8 years ago by sponge

Update to core available. Just update Seedless, Neodatis is the same, no war files changed…
and don't forget to restart the router after the update.

seedlessversions is Console: 00.00.33, Core: 00.00.3B,
EepAnnouncer: 00.00.0B, Server: 00.00.15, Neodatis: 2.1.beta14-209
(21-02-2010-20-30-42) PatchLevel? 9. Check your wrapper.log and http://127.0.0.1:7657/SeedlessConsole/ to verify versions.

Good luck.

comment:14 Changed 8 years ago by killyourtv

This has also happened to me in the past, ""Restart imminent", but not restarting" (and I also run seedless & the server), so it's not just a fluke.

comment:15 Changed 8 years ago by killyourtv

…but it's been awhile…and I don't have any dumps to contribute.

comment:16 Changed 8 years ago by sponge

May have been related to the recent network shit-storm. I'm guessing that you were overloaded, or something like that. I'll keep this open for about 1 more month, let me know if it happens again. If not, then we know it was related to the networking issues.

comment:17 in reply to:  16 Changed 8 years ago by killyourtv

Cc: killyourtv@… added

Replying to sponge:

May have been related to the recent network shit-storm.


HUH?

"Firefox won't restart when I asked it to after installing a new addon!"

"Uh…are you connected to the net when you try to close it? Call your ISP, it's their fault that your browser isn't restarting."

The process of shutting down should _never_ be able to get hung up by "network shitstorms" :| It should be able to see that "crap, we're holding up the show and the user wants to restart/shutdown the router. Let's not wait for that torrent result any longer and do the right thing and just stop."


I'm guessing that you were overloaded, or something like that.

I see the buck passed *a lot* but problems are not always caused by 'user error'. :|

Coming up: a bug related to this one but has nothing to do with being overloaded, user error, nor OpenJDK.

comment:18 Changed 8 years ago by killyourtv

comment:19 Changed 7 years ago by str4d

Milestone: 0.8.120.8.14

comment:20 Changed 7 years ago by sponge

Resolution: duplicate
Status: assignedclosed

duplicate of #566, continuing discussion there

comment:21 Changed 7 years ago by zzz

Milestone: 0.8.14

Milestone 0.8.14 deleted

Note: See TracTickets for help on using tickets.