Opened 2 years ago

Last modified 22 months ago

#1893 testing defect

i2psnark: data loss on restarts

Reported by: zzz Owned by: zzz
Priority: minor Milestone: 0.9.30
Component: apps/i2psnark Version: 0.9.28
Keywords: Cc:
Parent Tickets:

Subtickets (add)

#1903: Way to reproduce issues described in 1893defectclosedzzz

Change History (10)

comment:1 Changed 2 years ago by jogger

Add a subticket #1903.

comment:2 Changed 2 years ago by zzz

  • Status changed from new to infoneeded_new

#1903 describes a way to reproduce, where the per-torrent config files under i2psnark.config.d are deleted. Those config files contain state information including completion and whether to remap filenames to "safe" characters. When the files are deleted, we forget that information. Torrents will be rescanned but we dont guess what the file mapping will be in the absence of a config file.

I don't understand why the config files would be deleted or corrupted on shutdown, I'll try to come up with a theory.

Please provide the info requested over in the forum thread: quote:

Please provide the version information from the top of /logs in the console, and the number of torrents you have running.

comment:3 Changed 2 years ago by jogger

  • Status changed from infoneeded_new to new

Found better way to reproduce:

Installed new i2p machine, 0% participating traffic, pointed Snark at data dir with 72 torrents already successfully created and seeded on a different machine.

On first start all torrents were scanned. Files with "unsafe" characters were recreated and redownloaded as described in the forum. Torrents without "unsafe" characters were checked 100% complete.

First restart after some hours: All torrents rescanned - obviously all config files corrupted.

Second restart after some more hours: 19 torrents directly loaded, rest rescanned - obviously most config files corrupted.

Third restart after some more hours: 7 torrents directly loaded, rest rescanned - obviously most config files corrupted.

This has been seen on all my i2p machines on wheezy, jessie, stretch and macOS with 4 different HW platforms, all with journaled file systems. Others also have this problem as there are often leechers appearing being 80 - 95% complete on torrents that have been inactive for weeks. This must be previous seeders exposed to the same problem.

Conclusion 1: The "safe" file name mapping causes leechers to create file names different from the original torrent. If the config file of the original torrent gets corrupted, it can only properly be recreated by recreating the .torrent file. Recreating by rescan loses the original names. Possible solution: "Unsafe" file names get renamed before torrent creation.

Conclusion 2: The config file corruption problem caused by crash or by restart without participating tunnels really exists on a variety of platforms and i2p versions. Possible solution: If a torrent is rescanned, check file content against the torrent file, do not trust the config file. Introduce a way of integrity checking for the config files themselves.

I2P version: 0.9.28-0
Java version: Oracle Corporation 1.8.0_25 (Java(TM) SE Runtime Environment 1.8.0_25-b17)
Wrapper version: 3.5.30
Server version: 8.1.21.v20160908
Servlet version: Jasper JSP 2.1 Engine
JSTL version: standard-taglib 1.2.0
Platform: Mac OS X x86_64 10.12.2
Jcpuid version: 0
Processor: uninitialized (unrecognized)
Jbigi: Native BigInteger? library jbigi not loaded - using pure Java - poor performance may result - see http://i2p-projekt.i2p/jbigi for help
Jbigi version: 0
GMP version: unknown

comment:4 Changed 2 years ago by zzz

  • Milestone changed from undecided to 0.9.29
  • Status changed from new to accepted

OK. Thanks for the detailed information.

I think I was confused by your use of the word "corrupted". If a torrent gets rescanned in this scenario, it's probably not that the config file was corrupted or deleted - it probably just didn't get updated (i.e. rewritten) on shutdown. Since the torrent data will then be more recent than the config file, the torrent will be rescanned at the next startup.

In addition, the config file contains the setting on mapping of unsafe characters. Without that file, we make some assumption. We set mapping if we downloaded the torrent; we don't map if we created the torrent. If the torrent just appears in the directory, that's where we don't know what to do. The code may be able to handle this better, not sure. In any case, it's unrelated to the shutdown issue, and I may make a separate ticket for it.

I'll be looking at ways to speed up and parallelize the shutdown, and wait longer for it to complete.

comment:5 Changed 2 years ago by zzz

  • Status changed from accepted to testing

In 4f859c6acce391a4c0b3bee3bbe96cded077c60f to be 0.9.28-5:

  • Parallelize the context shutdown tasks, so the jetty shutdown starts sooner
  • Increase max time for shutdown tasks from 10 sec to 60 sec (120 on ARM)
  • Change 20 ms delay between unannounces to every 8th unannounce

Even with the previous 20 ms delay in the loop, we should have been able to get almost 500 torrents to shut down cleanly (maybe 450... I think we have Jetty configured to wait a second), so I still don't fully understand what's going on. With this change we should be able to do a clean shutdown with several thousand torrents.

The only thing that's threaded is the unannounces themselves, so even if some of those aren't completed before shutdown, it shouldn't affect the config file.

The torrents are all closing quickly for me, perhaps though it takes longer if a lot of peers are connected.

Opening ticket #1938 for the unsafe character mapping issue.

Logs after change:

xx:50:21.519 WARN  [le Jetty-122] net.i2p.router.Router         : Router state change from RUNNING to GRACEFUL_SHUTDOWN
xx:50:21.519 CRIT  [ShutdownHook] net.i2p.router.Router         : Graceful shutdown progress: No more tunnels, starting final shutdown
xx:50:23.519 WARN  [ShutdownHook] net.i2p.router.Router         : Router state change from GRACEFUL_SHUTDOWN to FINAL_SHUTDOWN_1
xx:50:23.519 WARN  [ShutdownHook] net.i2p.router.Router         : Router state change from FINAL_SHUTDOWN_1 to FINAL_SHUTDOWN_2
xx:50:23.519 CRIT  [ShutdownHook] net.i2p.router.Router         : Starting final shutdown(2)
xx:50:23.521 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.data.SDSCache$Shutdown
xx:50:23.521 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.data.SDSCache$Shutdown
xx:50:23.521 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.data.SDSCache$Shutdown
xx:50:23.522 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.522 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.util.SimpleTimer2$Shutdown
xx:50:23.522 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.time.RouterTimestamper$Shutdown
xx:50:23.522 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.524 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.525 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.web.ConfigServiceHandler$UpdateWrapperOrRekeyTask
xx:50:23.525 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.client.naming.BlockfileNamingService$Shutdown
xx:50:23.525 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.525 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.web.StatSummarizer$Shutdown
xx:50:23.533 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.startup.RouterAppManager$Shutdown
xx:50:23.533 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.533 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.534 WARN  [ShutdownHook] net.i2p.router.Router         : Running shutdown task class net.i2p.util.SimpleTimer2$Shutdown
xx:50:23.535 WARN  [ger$Shutdown] outer.startup.RouterAppManager: Shutting down client Router Console
xx:50:23.535 INFO  [ger$Shutdown] outer.startup.RouterAppManager: Client Router Console is now STOPPING 
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.data.SDSCache$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.data.SDSCache$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.data.SDSCache$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.util.SimpleTimer2$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.time.RouterTimestamper$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.web.ConfigServiceHandler$UpdateWrapperOrRekeyTask
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.client.naming.BlockfileNamingService$Shutdown
xx:50:23.536 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:23.570 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.web.StatSummarizer$Shutdown
xx:50:24.563 WARN  [ger$Shutdown] org.klomp.snark.SnarkManager  : Snark stop() begin
java.lang.Exception: I did it
	at org.klomp.snark.SnarkManager.stop(SnarkManager.java:307)
	at org.klomp.snark.web.I2PSnarkServlet.destroy(I2PSnarkServlet.java:102)
	at org.eclipse.jetty.servlet.ServletHolder.destroyInstance(ServletHolder.java:395)
	at org.eclipse.jetty.servlet.ServletHolder.doStop(ServletHolder.java:368)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
	at org.eclipse.jetty.servlet.ServletHandler.doStop(ServletHandler.java:240)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
	at org.eclipse.jetty.server.handler.HandlerWrapper.doStop(HandlerWrapper.java:107)
	at org.eclipse.jetty.security.SecurityHandler.doStop(SecurityHandler.java:392)
	at org.eclipse.jetty.security.ConstraintSecurityHandler.doStop(ConstraintSecurityHandler.java:464)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
	at org.eclipse.jetty.server.handler.HandlerWrapper.doStop(HandlerWrapper.java:107)
	at org.eclipse.jetty.server.session.SessionHandler.doStop(SessionHandler.java:136)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
	at org.eclipse.jetty.server.handler.HandlerWrapper.doStop(HandlerWrapper.java:107)
	at org.eclipse.jetty.server.handler.ContextHandler.doStop(ContextHandler.java:815)
	at org.eclipse.jetty.servlet.ServletContextHandler.doStop(ServletContextHandler.java:160)
	at org.eclipse.jetty.webapp.WebAppContext.doStop(WebAppContext.java:518)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
	at org.eclipse.jetty.server.handler.HandlerCollection.doStop(HandlerCollection.java:250)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
	at org.eclipse.jetty.server.handler.HandlerCollection.doStop(HandlerCollection.java:250)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
	at org.eclipse.jetty.server.handler.HandlerWrapper.doStop(HandlerWrapper.java:107)
	at org.eclipse.jetty.server.Server.doStop(Server.java:343)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:89)
	at net.i2p.router.web.RouterConsoleRunner.shutdown(RouterConsoleRunner.java:233)
	at net.i2p.router.startup.RouterAppManager.shutdown(RouterAppManager.java:190)
	at net.i2p.router.startup.RouterAppManager$Shutdown.run(RouterAppManager.java:201)
	at java.lang.Thread.run(Thread.java:745)
	at net.i2p.util.I2PThread.run(I2PThread.java:103)
xx:50:24.566 WARN  [ger$Shutdown] org.klomp.snark.SnarkManager  : SnarkManager final shutdown
xx:50:30.485 WARN  [ger$Shutdown] org.klomp.snark.SnarkManager  : Snark stop() end
xx:50:30.544 INFO  [ger$Shutdown] outer.startup.RouterAppManager: Client Router Console is now STOPPED 
xx:50:30.544 WARN  [ger$Shutdown] outer.startup.RouterAppManager: Shutting down client i2ptunnel
xx:50:30.544 INFO  [ger$Shutdown] outer.startup.RouterAppManager: Client i2ptunnel is now STOPPING 
xx:50:30.545 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.startup.RouterAppManager$Shutdown
xx:50:30.545 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:30.545 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.router.util.DecayingBloomFilter$Shutdown
xx:50:30.545 INFO  [ShutdownHook] net.i2p.router.Router         : Shutdown task complete: Shutdown task net.i2p.util.SimpleTimer2$Shutdown
xx:50:35.196 WARN  [ShutdownHook] net.i2p.router.Router         : Router state change from FINAL_SHUTDOWN_2 to FINAL_SHUTDOWN_3
xx:50:35.197 CRIT  [ShutdownHook] net.i2p.router.Router         : Shutdown(2) complete

comment:6 Changed 2 years ago by jogger

Thanks for your effort. I will test.

However you are clearly describing that a clean shutdown is necessary to avoid torrent rescans at startup. That corresponds to my reports that this bug produces the same symptoms as a machine crash. I propose a real fix for the issue that will survive a "kill -9" by

  • either writing out the config files frequently with every change
  • or separating the config files into two (metadata and status) with one that is never touched once the torrent is complete and thus allows a clean reload after a restart.

comment:7 Changed 2 years ago by zzz

Thanks for your comments. Unfortunately, if you think about it, it's impossible to guarantee a consistent, clean state after a kill -9. If you stop code at an arbitrary time, you're going to get arbitrary results. The best we can do is recognize that state on restart and do a rescan. With certain changes such as rewriting the config file after every change, we might make it better... or could we make it worse, by increasing the chance that the config file partially written? I'll think about possible further improvements but no-rescan-ever isn't possible.

comment:8 Changed 22 months ago by jogger

The fix described above does not work in all cases.

I did a graceful restart upgrading from 0.9.29-3 to 4 and all config files were dumped out successfully at that time.

Subsequently, config files were written upon creation, deletion and completion of torrents.

I then did a graceful restart in order to upgrade from from 0.9.29-4 to 5 (seeding 99 torrents at that time). Not a single config file was written out. I lost the progress data at least.

Again no config files written upon a graceful restart updating to 0.9.29-6.

Last edited 22 months ago by jogger (previous) (diff)

comment:9 Changed 22 months ago by zzz

  • Status changed from testing to needs_work

reopening based on comment 8

comment:10 Changed 22 months ago by zzz

  • Milestone changed from 0.9.29 to 0.9.30
  • Status changed from needs_work to testing

Webapp shutdown was broken by Jetty 9 (0.9.29-4) as reported in comment 8. Good catch!

After that was fixed, all config files for running torrents were being written at shutdown, whether changed or not. This was broken in late 2015, possibly as a temporary thing. Fixed to only write the necessary config files, which will speed up the shutdown. Only write when the storage (i.e. the bitfield) changed, or we uploaded something, so the upload byte count changed.

Also, not directly related to OP's issue, since he's on a Mac, but writing the config files is very slow on Raspberry Pi, made a change in DataHelper? to not sync the file system, which speeds it up by about 2x. I did tests on my macbook, and syncs are very fast, shouldn't be an issue.

All in 0.9.29-14 b2f40838a26c79ba642a60232f1505588783cc12

thanks for testing

Note: See TracTickets for help on using tickets.