Opened 6 years ago

Last modified 4 years ago

#1014 needs_work defect

I2P doesn't recover after hibernation

Reported by: DISABLED Owned by: dg
Priority: minor Milestone: 0.9.23
Component: router/general Version: 0.9.7.1
Keywords: hang time Cc:
Parent Tickets: Sensitive: no

Description

Hello!

In my experience, I2P does not begin working again after I have paused a virtual machine (even for a few seconds or minutes), hibernated or used the standby options in my computer.

Sometimes, this will involve a serious of error messages involving time in /logs and an error in the sidebar asking "CHECK YOUR NETWORK CONNECTION". My network connection is fine. The only remedy (seemingly) to this is to restart I2P entirely which is painful on a laptop or for usability.

I2P will do nothing and remain in a comatose state until I restart it, even hours/days after the hibernate or pause or sleep.

Subtickets

Change History (9)

comment:1 Changed 6 years ago by zzz

Component: unspecifiedrouter/general
Milestone: 0.9.80.9.9

This could be due to clock skew (although I've taken several stabs at fixing it over the years) or something else. In the past I tested by just running Java in the foreground (no wrapper) and then Z. I think the 'check network' message is just a result of zero connected peers.

If you have any relevant logs please include them here.

comment:2 in reply to:  1 Changed 6 years ago by DISABLED

Replying to zzz:

This could be due to clock skew (although I've taken several stabs at fixing it over the years) or something else. In the past I tested by just running Java in the foreground (no wrapper) and then Z. I think the 'check network' message is just a result of zero connected peers.

If you have any relevant logs please include them here.

I stopped the VM for a while. I2P restarted once the VM came up, it doesn't usually do this and remains in a dead state.

 55 13/09/13 16:37:51 CRIT  [leTimer2 3/4] net.i2p.util.Clock            : Large clock shift forward by 36m
 56 13/09/13 16:37:51 ERROR [uterWatchdog] client.ClientManagerFacadeImpl: Client XXXX has a leaseSet that expired 27m
 57 13/09/13 16:37:51 ERROR [uterWatchdog] client.ClientManagerFacadeImpl: Client XXXX has a leaseSet that expired 29m
 58 13/09/13 16:37:51 ERROR [uterWatchdog] client.ClientManagerFacadeImpl: Client XXXX has a leaseSet that expired 33m
 59 13/09/13 16:37:51 ERROR [uterWatchdog] client.ClientManagerFacadeImpl: Client XXXX has a leaseSet that expired 32m
 60 13/09/13 16:37:51 ERROR [uterWatchdog] 2p.router.tasks.RouterWatchdog: Ready and waiting jobs: 90
 61 13/09/13 16:37:51 ERROR [uterWatchdog] 2p.router.tasks.RouterWatchdog: Job lag: 2202476
 62 13/09/13 16:37:51 ERROR [uterWatchdog] 2p.router.tasks.RouterWatchdog: Participating tunnel count: 2438
 63 13/09/13 16:37:51 ERROR [uterWatchdog] 2p.router.tasks.RouterWatchdog: 1minute send processing time: 175.88627912006055
 64 13/09/13 16:37:51 ERROR [uterWatchdog] 2p.router.tasks.RouterWatchdog: Outbound send rate: 309949.405059494 Bps
 65 13/09/13 16:37:51 ERROR [uterWatchdog] 2p.router.tasks.RouterWatchdog: Memory: 123.76M/328.69M
 66 13/09/13 16:37:51 CRIT  [uterWatchdog] 2p.router.tasks.RouterWatchdog: Router appears hung, or there is severe network congestion.      Watchdog starts barking!
 67 13/09/13 16:37:51 ERROR [leTimer2 3/4] net.i2p.router.Router         : Restarting after large clock shift forward by 36m
 68 13/09/13 16:37:52 WARN  [ handler 1/1] er.transport.udp.PacketHandler: NTP failure, UDP adjusting clock by 36m
 69 13/09/13 16:37:52 ERROR [uter Restart] net.i2p.router.Router         : Stopping the router for a restart...
 70 13/09/13 16:37:52 WARN  [uter Restart] net.i2p.router.Router         : Stopping the client manager

comment:3 in reply to:  1 Changed 6 years ago by DISABLED

Replying to zzz:

I think the 'check network' message is just a result of zero connected peers.

Maybe. It stays this way forever until I hit restart. Hours, days. It's annoying for users on laptops who hibernate and wish to use their client tunnels immediately/shortly/1-3 minutes after waking up. In some cases, it recovers. I'm told in some instances (20-30s), it won't. The instance above was +30 minutes.

comment:4 Changed 5 years ago by str4d

Keywords: hang time added
Milestone: 0.9.9

comment:5 Changed 4 years ago by dg

This is still an issue on Windows 8.1 with 0.9.20. A lot of laptops will hibernate when their lid is shut or they are left on overnight.
The console shows "Check network connection and NAT/firewall".

Relevant (and only) logs:

19/06/15 16:07:12 CRIT  [NTCP Pumper ] net.i2p.util.Clock            : Large clock shift forward by 13h 
18/06/15 16:01:42 CRIT  [uildExecutor] net.i2p.util.Clock            : Large clock shift forward by 29m
18/06/15 05:10:34 CRIT  [uterWatchdog]  2p.router.tasks.RouterWatchdog: Router appears hung, or there is severe  network congestion.  Watchdog starts barking! 
18/06/15 05:09:57 CRIT  [NTCP Pumper ] net.i2p.util.Clock            : Large clock shift forward by 2h
17/06/15 16:03:04 CRIT  [NTCP Pumper ] net.i2p.util.Clock            : Large clock shift forward by 7h
17/06/15 05:28:20 CRIT  [uterWatchdog]  2p.router.tasks.RouterWatchdog: Router appears hung, or there is severe  network congestion.  Watchdog starts barking!
17/06/15 05:27:45 CRIT  [ Establisher] net.i2p.util.Clock            : Large clock shift forward by 2h 

comment:6 Changed 4 years ago by dg

Owner: set to dg
Status: newaccepted

Attempting a fix for this.

<+dg> zzz: Would you be able to look at #1014 at some point? Or any thoughts so that I can have a stab. It's a recurring problem that I come across
<&zzz> dg all I can give you is some debugging ideas
<&zzz> set default log level to warn
<&zzz> try to figure out if it's a transport issue or a tunnel issue or a timer issue or a soft-restart issue or ...
<&zzz> forward clock shifts are generally an easier problem than backwards
<&zzz> see #1634 for a backwards problem, and how far i got before I got stuck
<&zzz> suspend problems may be easier to debug on linux if you can just suspend (^Z) the process (all da threads) somehow

comment:7 Changed 4 years ago by dg

Milestone: 0.9.23

comment:8 Changed 4 years ago by dg

Status: acceptedtesting

Committed in 0.9.22-11 8dec222a2c0e619fd455367b3647b87bf349c6dc.

Needs some testing, especially on Windows.

comment:9 Changed 4 years ago by zzz

Status: testingneeds_work

As discussed on IRC:

The commenting out of the active peer count check is a good fix.

The uncommenting of the peer manager and job queue restarts is troublesome.

The peer manager restart saves and reloads all the profiles, which is pointless. Let's figure out what the real change required to peer manager is. Adjusting the time in the stats? Rerunning the organizer? or?

The job queue restart removes all pending jobs, including ones like changing the netdb rotation key at midnight. As with peer manager, seems like some adjustment to the job queu or something is required.

I said that i thought the job queue was a ClockShiftListener?, but it's not, it's a ClockUpdateListener?. That should be investigated, perpaps that's how we can fix it.

tl;dr NACK the changes, need to come up with something more fine-grained, not the restart sledgehammer. Holler if you need help.

Note: See TracTickets for help on using tickets.