Opened 6 years ago

Closed 3 years ago

#815 closed defect (fixed)

i2ptunnel locking

Reported by: zzz Owned by: zzz
Priority: minor Milestone: 0.9.20
Component: apps/i2ptunnel Version: 0.9.3
Keywords: usability race hang Cc:
Parent Tickets:

Description

We've always had trouble with locking in i2ptunnel. When a tunnel takes a long time to build (initially) a lock can be held for quite a while. It's worse for the static singleton I2PTunnelClientBase.socketManager used for shared clients. It's easy to test this by unplugging the network cable or running with VMCommSystem. Clients that are configured for close-on-idle are also much more vulnerable since they may get reopened frequently.

This may also be the cause of dup dests. It also used to cause hung shutdowns (although the shutdown hook is now threaded so it doesn't?)

Synchronization "improvements" in TunnelControllerGroup? in 0.9.4 have made things worse. The UI is now blocked by a pending tunnel build.

There may be related tickets on dup destinations and hangs. There may also be a ticket or zzz.i2p post on fixing and generalizing the shared clients singleton (i.e. "shared client groups").

TCG and I2PTunnelClientBase may need significant rework. Some method to queue pending sockets and return may be better. However, we must be very careful to not break existing apps and tunnels that expect the initial socket open to block. Maybe we can have it block but without holding the TCG or shared clients lock. For further study.

Following dump of a UI hang is with network disconnected:

"RouterConsole Jetty" daemon prio=10 tid=0x00007f391c009000 nid=0x7efc waiting for monitor entry [0x00007f39c8ee2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at net.i2p.i2ptunnel.TunnelControllerGroup.clearAllMessages(TunnelControllerGroup.java:353)
	- waiting to lock <0x00000000ff51a340> (a net.i2p.i2ptunnel.TunnelControllerGroup)
	at net.i2p.i2ptunnel.web.IndexBean.getMessages(IndexBean.java:410)
	at net.i2p.i2ptunnel.jsp.index_jsp._jspService(index_jsp.java:99)
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:679)

   Locked ownable synchronizers:
	- <0x00000000ff51a750> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"Startup tunnels" daemon prio=10 tid=0x00007f399469b800 nid=0x7eef in Object.wait() [0x00007f39c8de2000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000ff541080> (a java.lang.Object)
	at net.i2p.client.I2PSessionImpl.connect(I2PSessionImpl.java:415)
	- locked <0x00000000ff541080> (a java.lang.Object)
	at net.i2p.client.streaming.I2PSocketManagerFactory.createManager(I2PSocketManagerFactory.java:157)
	at net.i2p.client.streaming.I2PSocketManagerFactory.createManager(I2PSocketManagerFactory.java:81)
	at net.i2p.i2ptunnel.I2PTunnelClientBase.buildSocketManager(I2PTunnelClientBase.java:417)
	at net.i2p.i2ptunnel.I2PTunnelClientBase.buildSocketManager(I2PTunnelClientBase.java:372)
	at net.i2p.i2ptunnel.I2PTunnelClientBase.getSocketManager(I2PTunnelClientBase.java:334)
	- locked <0x00000000ef12daf0> (a java.lang.Class for net.i2p.i2ptunnel.I2PTunnelClientBase)
	at net.i2p.i2ptunnel.I2PTunnelClientBase.getSocketManager(I2PTunnelClientBase.java:292)
	at net.i2p.i2ptunnel.I2PTunnelClientBase.verifySocketManager(I2PTunnelClientBase.java:274)
	- locked <0x00000000ff570350> (a java.lang.Object)
	at net.i2p.i2ptunnel.I2PTunnelClientBase.<init>(I2PTunnelClientBase.java:200)
	at net.i2p.i2ptunnel.I2PTunnelClientBase.<init>(I2PTunnelClientBase.java:147)
	at net.i2p.i2ptunnel.I2PTunnelHTTPClientBase.<init>(I2PTunnelHTTPClientBase.java:90)
	at net.i2p.i2ptunnel.I2PTunnelHTTPClient.<init>(I2PTunnelHTTPClient.java:197)
	at net.i2p.i2ptunnel.I2PTunnel.runHttpClient(I2PTunnel.java:792)
	at net.i2p.i2ptunnel.TunnelController.startHttpClient(TunnelController.java:228)
	at net.i2p.i2ptunnel.TunnelController.doStartTunnel(TunnelController.java:188)
	at net.i2p.i2ptunnel.TunnelController.startTunnel(TunnelController.java:149)
	at net.i2p.i2ptunnel.TunnelControllerGroup$StartControllers.run(TunnelControllerGroup.java:245)
	- locked <0x00000000ff51a340> (a net.i2p.i2ptunnel.TunnelControllerGroup)
	at java.lang.Thread.run(Thread.java:679)
	at net.i2p.util.I2PThread.run(I2PThread.java:85)

   Locked ownable synchronizers:
	- None

Subtickets

Attachments (4)

i2ptunnel-rwl.patch (10.8 KB) - added by str4d 4 years ago.
Use ReadWriteLock? on the list of TunnelControllers? to fix UI hang
i2ptunnel-locking.patch (22.7 KB) - added by str4d 4 years ago.
ReadWriteLock and state synchronization in TunnelControllerGroup, states in TunnelController
i2ptunnel-tcstate.patch (9.2 KB) - added by str4d 4 years ago.
States in TunnelController
blocking-saveConfig.txt (4.0 KB) - added by str4d 4 years ago.
Stacktrace showing how saveConfig is blocked by tunnels starting

Download all attachments as: .zip

Change History (25)

comment:1 Changed 6 years ago by zzz

Related: #650 and #642

comment:2 Changed 6 years ago by zzz

  • Milestone changed from 0.9.6 to 0.9.8
  • Owner set to zzz
  • Status changed from new to accepted

comment:3 Changed 5 years ago by zzz

  • Milestone changed from 0.9.8 to 0.9.10

I2CP and i2ptunnel server-side complete in 0.9.8. Ran out of time for the client side.

comment:4 Changed 4 years ago by str4d

  • Keywords usability race hang added
  • Milestone 0.9.10 deleted

comment:5 Changed 4 years ago by str4d

  • Keywords review-needed added

This patch fixes the UI hang in Android (I have not tested in webapp yet but I expect it does the same there). It does nothing to address the singleton I2PTunnelClientBase.socketManager.

Is there anything else that the synchronized methods were protecting? The only two variables that I can see being vulnerable are the ClientApp state (which is still protected with synchronized and volatile the same way it was before), and the list of TunnelControllers (which this patch moves into a ReentrantReadWriteLock).

comment:6 Changed 4 years ago by zzz

nack (??) first thoughts -

the ones that just go through the list of TC are fine, but if we're changing state we either have to synch the whole thing on this, or else protect the state change better so we aren't stopping and starting simultaneously, or two starts or two stops simultaneously.

e.g. something like:

private void loadControllers(String configFile) {
   synchronized(this) {
        if (_state != INITIALIZED && _state != STOPPED && _state != CRASHED)
            return;
         changeState(STARTING);
    }

    ... rwlock section ...

    synchronized(this) {
        if (_state != STARTING)
            return;
         startupThread = new ...
         startupThread.start();
         changeState(RUNNING);
    }
}

or, do the rwlock stuff first, then synch(this) { change state, start thread, change state }

comment:7 Changed 4 years ago by str4d

ACK, thanks. I wouldn't have a problem having startup and stop methods both synchronizing on this, I don't think it should interfere with the rwlock. I will test and report.

comment:8 Changed 4 years ago by str4d

  • Keywords review-needed removed

Changed 4 years ago by str4d

Use ReadWriteLock? on the list of TunnelControllers? to fix UI hang

comment:9 Changed 4 years ago by str4d

New patch uploaded (and it seems that overwriting the old patch does *not* create a diff >_> ). Added back in synchronizeds for all TCG start/stop/etc. methods. For those methods, the writeLock() calls at a minimum are necessary to ensure that clearAllMessages() and getControllers() are protected. The readLock() calls are basically redundant, but I have left them in there for completeness.

comment:10 Changed 4 years ago by zzz

  • Milestone set to 0.9.19

ack, thanks

comment:11 Changed 4 years ago by zzz

  • Owner changed from zzz to str4d
  • Status changed from accepted to assigned

Changed 4 years ago by str4d

ReadWriteLock and state synchronization in TunnelControllerGroup, states in TunnelController

comment:12 Changed 4 years ago by str4d

The previous patch has at least one hang in it: if tunnels are starting and the user edits a tunnel config (or simply views it in Android, which triggers a config save), the UI hangs while waiting for the lock on TunnelControllerGroup.this. It could be fixed by un-synchronizing saveConfig(), but I was less convinced about the overall effectiveness of the patch.

This new patch instead shores up the state synchronization in TCG, and also adds state to TunnelController, which was previously (mostly) unnecessary because of synchronization everywhere in TCG. I think this is a better approach overall.

comment:13 Changed 4 years ago by str4d

<zzz> no i don't think unsynching loadConfig/saveConfig is wise. Remember that the config for all tunnels is in one file
<str4d> saveConfig() was the issue, and all it is doing is saving the current tunnel state
<zzz> so it can't be fixed via changes in TC alone
<str4d> (loadConfig is a separate issue)
<zzz> it's saving state for all tunnels at once
<str4d> AFAICT everywhere a tunnel's state is changed, a save is issued which saves config for all tunnels.
<zzz> well, not state but config
<zzz> but yes
<str4d> Is there a case where we explicitly don't want that?
<str4d> (where an unrelated tunnel could have changes that aren't matched in config, that we don't want saved?)
<zzz> it can't be helped since it's all in one file
<str4d> I just remembered that the other thing I wanted to do was add a rwlock around TC''s _config
<str4d> (which would protect it against any sync issues)
<str4d> Would that be sufficient to allay your concern?
<str4d> Or is there something I'm missing?
<zzz> not opposed to the TC changes (replacing multiple state variables with an enum is always good), although you missed several places where you need synching around _state == foo checks
<str4d> Huh, thought I got all the bastards.
<zzz> maybe you can avoid it with the getters, synching there can lead to issues elsewhere, just rely on volatile also, which is sleazy but generally works
<str4d> So make _config volatile?
<zzz> perfect java way; if you synch reads/writes anywhere, synch it everywhere. sleazy way: synch it where it matters, but also make it volatile and don't synch it in the getters to avoid hangs
<str4d> mmm
<str4d> What I'm trying to avoid are hangs between tunnel start/stop and UI-relevant operations like clearAllMessages() and saveConfig()
<zzz> back to TCG, I think you gotta keep the load/save synch and live with UI hangs. If you're saving-on-read in android, gotta fix that. 
<zzz> and the previous patch fixed clearAllMessages, this patch is working on saveConfig, right?
<str4d> Not saving-on-read, saving-on-exit
<str4d> (exit of the "edit tunnel" page)
<str4d> To be precise, saving in onPause(), which is the only place I can guarantee gets called. Anywhere else could be skipped if the app is killed.
<zzz> how about 1) don't save-on-exit if nothing changed and 2) thread the save
<zzz> you don't want file i/o from the UI thread in android anyway
<str4d> No, and I did think about 2) but if it is going to hang anyway then we still run into problems
<str4d> because onPause() is called whenever the app loses focus - notification drawer pulled down, screen locked, FB messenger overlay opened, etc, etc
<str4d> and if I launch a thread in onPause() that hangs, then I can end up with multiple hung threads.
<str4d> (And I figured it was better to fix the hang than to hack around it)
<zzz> you'd need state management for your saver, so you don't get two
<zzz> our locking for i2ptunnel.config must be airtight. if it gets corrupted we are doomed.
<str4d> mmm.
<str4d> Like I said, there's already a bug in the earlier patch :P
<str4d> I do want to get this right.
<zzz> and the index for a controller may change when a controller is removed, which is fun and another potential source of corruption
<str4d> heh, yah
<str4d> What I'm doing in Android is, every time the user opens the edit screen, the corresponding tunnel's config is converted on-the-fly to a SharedPreferences database.
<str4d> Then in onPause(), the SharedPreferences is converted back into a config for the tunnel, and saved.
<zzz> so my thinking is that the saveConfig() hang may not be fixable, or easily fixable, or fixable in TC/TCG.. may just have to be dealt with in UI code. So I recommend checking in the previuos patch and then let's see about further changes. I don't think I'm happy with the new patch but it's getting too much to keep in my head
<str4d> ack

Changed 4 years ago by str4d

States in TunnelController

comment:14 Changed 4 years ago by str4d

  • Keywords review-needed added

attachment:i2ptunnel-rwl.patch pushed in 39127d6610a8c05b34cc61003b63d619fff0bb3d. TunnelController states patch fixed and split out into attachment:i2ptunnel-tcstate.patch for review.

comment:15 Changed 4 years ago by str4d

  • Keywords review-needed removed

TunnelController states pushed with additional fixes in 33b368d75d4b7536dd4a29efa35fba64e6dc409d.

comment:16 Changed 4 years ago by str4d

I started playing with this in I2P Android, and have run into an issue.

Android's lifecycle dictates that at any point after onStop() the application can be killed, so unsaved changes should be saved there. onStop() is called when the app is no longer visible, so it is very likely that the user will not be changing any more settings anytime soon - but it is possible that the user could navigate back from the other app, change tunnel settings and then exit, which would call onStop() again.

The problem I have is that there is no way to kill and restart the first save thread that was created, because synchronized blocks/methods cannot be interrupted. Therefore, no changes after the first onStop() call can be saved, because by the time the thread is at the synchronized block it has already turned the SharedPreferences into a config to be saved.

The problem is worse for pre-Honeycomb devices, because an app can be killed any time after onPause(), which happens far more frequently.

Additionally, if the screen is rotated, the app is destroyed and restarted, and if the save is done asynchronously and it is still blocked, then any previous changes will not be shown, and any new changes will not be saved. I could try to detect this and account for it, but the user could equally exit the settings and re-enter right afterwards, which would show the same problem.

Changed 4 years ago by str4d

Stacktrace showing how saveConfig is blocked by tunnels starting

comment:17 Changed 4 years ago by zzz

  • Milestone changed from 0.9.19 to 0.9.20
  • Owner changed from str4d to zzz

October 2013 local patches for starting clients in background checked in by branching from 0.9.8.1 and then propping from i2p.i2p head, in branch i2p.i2p.zzz.815, untested, bugs likely. iirc the issues were related to close-on-idle and restarting, leading to dup tunnel pools. To be tested. Target 0.9.20.

The checkin noted in comment 15 above does fix the specific issue captured in the OP dump but the larger issues noted in the OP are to be addressed in the 815 branch.

comment:18 Changed 4 years ago by zzz

  • Status changed from assigned to testing

propped in 0.9.19-4, lightly tested, may need additional locking in I2PTunnelClientBase

comment:19 Changed 4 years ago by zzz

dup shared clients (possibly related #1545) hopefully fixed in 66d0919530ea7978a7e6bb2675679e1d458d9888 0.9.19-28-rc

comment:20 Changed 4 years ago by str4d

As of 0.9.20 client tunnels were still connecting to the router in their constructor. Fixed in a652b3a60daed7331a984f6c04d2b5a1379e0982 0.9.20-2

comment:21 Changed 3 years ago by zzz

  • Resolution set to fixed
  • Status changed from testing to closed

presumed fixed

Note: See TracTickets for help on using tickets.