Opened 7 years ago

Closed 7 years ago

Last modified 3 years ago

#806 closed defect (not a bug)

B0rked in the tunnel handler: effectively hung the router

Reported by: gonk Owned by:
Priority: critical Milestone: 0.9.4
Component: router/general Version: 0.9.3
Keywords: hung b0rked Cc:
Parent Tickets: Sensitive: no

Description

Repeated exceptions in TunnelGatewayMessage?.java cause router to become unresponsive.

Stacktrace from logs attached.

I got several of these within a few hours, the Local Destinations starts went yellow and eventually the router didn't let me connect to irc, load eepsites, and all my torrents dropped to 0KB/s. Reconnecting didn't help, I had to restart the router.

Based on the logs I got a single one of these yesterday, too, but apparently the router recovered by itself.

The router uptime was nearing three days when this happened. Here's the timeline from this morning:

12/6/12 9:56:07 AM CRIT [dHandler 2/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 9:57:38 AM CRIT [dHandler 2/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 10:08:08 AM CRIT [dHandler 1/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 10:14:41 AM CRIT [dHandler 1/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 10:22:53 AM CRIT [dHandler 1/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 10:23:36 AM CRIT [dHandler 1/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 11:25:56 AM CRIT [dHandler 2/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 11:31:14 AM CRIT [dHandler 1/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 11:42:58 AM CRIT [dHandler 1/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler
12/6/12 1:27:48 PM CRIT [dHandler 2/2] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler

At this point I could see a few KB of router bandwidth being used but no new I2P connections of my own succeeded.

Subtickets

Attachments (1)

log-router-0.txt (27.9 KB) - added by gonk 7 years ago.
Logs for "B0rked in the tunnel handler" defect

Download all attachments as: .zip

Change History (11)

Changed 7 years ago by gonk

Attachment: log-router-0.txt added

Logs for "B0rked in the tunnel handler" defect

comment:1 Changed 7 years ago by zzz

Component: unspecifiedrouter/general

router version please

comment:2 in reply to:  1 Changed 7 years ago by gonk

Replying to zzz:

router version please

0.9.3-0

comment:3 Changed 7 years ago by zzz

and java version please

comment:4 in reply to:  3 Changed 7 years ago by gonk

Replying to zzz:

and java version please

$ java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~12.04.1)
OpenJDK Zero VM (build 20.0-b12, mixed mode)
$ uname -srvmp
Linux 3.1.10-l4t.r16.01 #497 SMP PREEMPT Thu Oct 18 17:18:54 IST 2012 armv7l armv7l

This is a fanless ARM system. OpenJDK 1.6 is known to have reached uptimes of 1-2 weeks with I2P this year.

Is there anything else that you need?

comment:5 Changed 7 years ago by zzz

I'm stumped. The error seems "impossible". My guess is this is a trimslice? I occasionally see very strange and unreproducible errors on my trimslice, seen by nobody else. I think the ARM java is buggy on it.

The BuildHandler? errors are also not causing the watchdog errors. The handler errors also shouldn't be fatal, it's actually catching the errors and moving on.

Notes to self:

  • 8 record TBRM not the VTBRM, which is odd, that would mean some router < 0.7.12 in the tunnel and we have very very few (any?) of those in the network now
  • NPE caused by I2NPMessageImpl.toByteArray() returning null instead of throwing an exception
  • closest I can get to root cause for now - TBRM.writeMessageBody is for some reason writing only 1 record instead of 8, returning 544 instead of 4224.

comment:6 in reply to:  5 Changed 7 years ago by gonk

Replying to zzz:

I'm stumped. The error seems "impossible". My guess is this is a trimslice? I occasionally see very strange and unreproducible errors on my trimslice, seen by nobody else. I think the ARM java is buggy on it.

It's a Trimslice, yes. The Java on trimslice/arm is rumoured to be dubious. Trying openjdk 1.7 earlier this fall was a total disaster, filling my logs with SIGILL deaths; then reverting back to openjdk 1.6 worked. That's why I said I've seen a maximum of 1-2 weeks of uptime under this very same JVM. Either the jvm is working well enough or the previous I2P version that achieved the said uptime simply didn't hit the disastrous codepath.

I'll post more information if similar things happen.

I've ran i2p on trimslice for about a year now and it has mostly worked pretty well. It needs occasional restarts every few days but that's probably ok.

For now, I have a cron script that detects if the last N "Router hung" messages in the log have happened within a certain timeframe dT, and if so, it will restart the router. Is there a way to restart the router "gracefully"? I think "i2prouter restart" will execute a hard shutdown.

comment:7 Changed 7 years ago by zzz

Resolution: not a bug
Status: newclosed

recent i2prouter scripts support 'i2prouter graceful' but it isn't included in updates so if you don't have it you'll have to reinstall.

I assume the impossible errors we've seen on trimslice are not so much a particular code path as they are a memory coherence bug from SMP. Some times, i2p dies right at startup due to some error that appears "impossible" when I track it down. Then I restart and all is good.

I suppose another possibility is hardware problems, i.e. ram corruption.

I'm still on ubuntu 11.04 on my trimslice, perhaps a newer version has a better java, I don't know. Not yet ready to spend a day or two updating it to find out.

Closing this ticket on the assumption it's a Java problem. If you see a new error please open a new ticket.

comment:8 Changed 3 years ago by xana

I got a similar error, echelon told me the problem is already known, but I want to provide the stacktrace. Java: openjdk 1.7.0_95, router version: 0.9.24-0.

Stacktrace:

3/18/16 2:47:28 AM CRIT [dHandler 1/4] outer.tunnel.pool.BuildHandler?: B0rked in the tunnel handler

java.lang.NullPointerException?:
at net.i2p.data.i2np.TunnelGatewayMessage?.calculateWrittenLength(TunnelGatewayMessage?.java:70)
at net.i2p.data.i2np.I2NPMessageImpl.getMessageSize(I2NPMessageImpl.java:283)
at net.i2p.router.OutNetMessage?.<init>(OutNetMessage?.java:104)
at net.i2p.router.tunnel.pool.BuildHandler?.handleReq(BuildHandler?.java:947)
at net.i2p.router.tunnel.pool.BuildHandler?.handleRequest(BuildHandler?.java:519)
at net.i2p.router.tunnel.pool.BuildHandler?.handleInboundRequest(BuildHandler?.java:269)
at net.i2p.router.tunnel.pool.BuildHandler?.run(BuildHandler?.java:220)
at java.lang.Thread.run(Thread.java:745)
at net.i2p.util.I2PThread.run(I2PThread.java:103)

3/18/16 2:47:28 AM CRIT [dHandler 1/4] .i2p.data.i2np.I2NPMessageImpl: Error writing out 4240 (written: 544, msgSize: 4240, writtenLen: 4224) for TunnelBuildReplyMessage?

I hope this will help to fix the issue :)

comment:9 Changed 3 years ago by zzz

@xana The stack trace is indeed identical to that reported by the OP.

You didn't say what hardware you are running on. I assume it is ARM. If so, the solution is to switch to Oracle Java 8. OpenJDK on ARM is slow and buggy.

comment:10 Changed 3 years ago by xana

Yes, it is running on an ARM board. I switched to oracle 8 like you said. Thanks for the help.

Note: See TracTickets for help on using tickets.