Opened 6 months ago

Last modified 5 months ago

#2453 new defect

Fix classloader leaks

Reported by: jogger Owned by:
Priority: minor Milestone: undecided
Component: router/general Version: 0.9.38
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

Today got a spontaneous restart together with the following misleading log:

2019/03/06 13:37:25 | CRIT [P reader 2/2] net.i2p.router.Router : Thread ran out of memory, shutting down I2P
2019/03/06 13:37:25 | java.lang.OutOfMemoryError?: Metaspace
2019/03/06 13:37:54 | CRIT [P reader 2/2] net.i2p.router.Router : free mem: 138986168 total mem: 268435456
2019/03/06 13:37:54 | CRIT [P reader 2/2] net.i2p.router.Router : To prevent future shutdowns, increase wrapper.java.maxmemory in /home/e/i2p/wrapper.config

The metaspace OOM results from classloader leaks that I am unable to debug. There are at least two sources of leaks I noticed

  • a sudden hike by 500 classes loaded during unattended operation
  • some classes added by repeated starting and stopping of webapps

Had seen those restarts often before but now was monitoring with jconsole all the time. Workaround is to set MaxMetaSpace? to some insane value.

Subtickets

Change History (10)

comment:1 Changed 6 months ago by Zlatin Balevsky

The sudden hike you are mentioning sounds very suspicious and I would like to investigate further. If you can, please run with -verbose:class ( https://stackoverflow.com/questions/10230279/java-verbose-class-loading ) and I will too.

Regarding MetaSpace, IIRC that was introduced in Java 8 and we're still targeting Java 7…

comment:2 Changed 5 months ago by zzz

what's "misleading" about the log?

comment:3 Changed 5 months ago by jogger

The log is misleading because it points to maxmemory as a possible solution. In this case the JVM clearly reports a metaspace OOM. I suspect the above hint would even be spit out if one gets an OOM because running with -Xss set too low.

Just read comment 1, will try that. Will need some days to reappear. The hike is even more significant, today short term > 700, visibility for me depends on the graph scaling in jconsole.

Last edited 5 months ago by jogger (previous) (diff)

comment:4 Changed 5 months ago by zzz

You have any suggestions on how to improve the log message, or an alternative message if "metaspace" is the problem?

something like this?

if (oom.getMessage().contains("Metaspace"))

log("Fix your Metaspace!!!")

comment:5 Changed 5 months ago by zzz

btw, case 2 in OP (start/stop of webapps) may be related to the way jetty webapp classloading works, or may be a byproduct of what we do for plugin webapps, where we try to make sure to get the new classes after a plugin is updated. Not sure if the webapp logic forces that for non-plugin webapps or not.

whatever the case, I wouldn't worry about it, as restarting webapps is very rare in normal operation.

comment:6 Changed 5 months ago by jogger

About webapps I agree, fixing this is only important if uptime > 1 year desired.

Other apps die silently on an OOM and leave it entirely up to the user to figure out what went wrong. We should acknowledge that most users run Java 11 as standard part of their OS now and spit out errors accordingly. There are many memory areas Java deals with now, so erors messages should reflect that. From my experience I had heap OOMs only through my own coding errors, while metaspace and stack overflow OOM were more common and should be reported as such. For direct memory shortage I have only seen performance issues so far. Code cache could also be considered.

We should not log "Fix your Metaspace!!!", but use neutral language there. After all people with memory restrictions will set memory parameters to allow some headroom above multiday maximums. If we bust those it may be our fault.

re comment 1: Class loader peaks are caused by hundreds of java.lang.invoke.LambdaForm?$MH/0x….. with number of loaded classes never returning to previous levels afterwards.

comment:7 Changed 5 months ago by zzz

I don't have any data on Java versions in our userbase (do you?), but I suspect that most of our users are on windows, and most of them are on Java 8.

"Fix your metaspace" was a placeholder, what's your suggestion for a real error message, and is contains("Metaspace") sufficient to detect when to display it?

comment:8 Changed 5 months ago by Zlatin Balevsky

Class loader peaks are caused by hundreds of java.lang.invoke.LambdaForm??$MH/0x….. with number of loaded classes never returning to previous levels afterwards.

Very strange. I get a few of those in my log as well, but not hundreds and they get unloaded at later time. I wonder if it could be an artifact of having jconsole attached. Can you correlate the timing of their loading with any other activity on the router? Actually, what is the router doing - is it just routing traffic or seeding torrents, or something else?

comment:9 Changed 5 months ago by jogger

There is no correlation to other activity. Occurs during unattended operation, also not related to any IP change.

comment:10 Changed 5 months ago by zzz

possibly related: #2471

Note: See TracTickets for help on using tickets.