Opened 2 years ago

Closed 16 months ago

#2453 closed defect (wontfix)

Fix classloader leaks

Reported by: jogger Owned by:
Priority: minor Milestone: n/a
Component: router/general Version: 0.9.38
Keywords: Cc:
Parent Tickets: Sensitive: no


Today got a spontaneous restart together with the following misleading log:

2019/03/06 13:37:25 | CRIT [P reader 2/2] net.i2p.router.Router : Thread ran out of memory, shutting down I2P
2019/03/06 13:37:25 | java.lang.OutOfMemoryError?: Metaspace
2019/03/06 13:37:54 | CRIT [P reader 2/2] net.i2p.router.Router : free mem: 138986168 total mem: 268435456
2019/03/06 13:37:54 | CRIT [P reader 2/2] net.i2p.router.Router : To prevent future shutdowns, increase in /home/e/i2p/wrapper.config

The metaspace OOM results from classloader leaks that I am unable to debug. There are at least two sources of leaks I noticed

  • a sudden hike by 500 classes loaded during unattended operation
  • some classes added by repeated starting and stopping of webapps

Had seen those restarts often before but now was monitoring with jconsole all the time. Workaround is to set MaxMetaSpace? to some insane value.


Change History (13)

comment:1 Changed 2 years ago by Zlatin Balevsky

The sudden hike you are mentioning sounds very suspicious and I would like to investigate further. If you can, please run with -verbose:class ( ) and I will too.

Regarding MetaSpace, IIRC that was introduced in Java 8 and we're still targeting Java 7…

comment:2 Changed 2 years ago by zzz

what's "misleading" about the log?

comment:3 Changed 2 years ago by jogger

The log is misleading because it points to maxmemory as a possible solution. In this case the JVM clearly reports a metaspace OOM. I suspect the above hint would even be spit out if one gets an OOM because running with -Xss set too low.

Just read comment 1, will try that. Will need some days to reappear. The hike is even more significant, today short term > 700, visibility for me depends on the graph scaling in jconsole.

Last edited 2 years ago by jogger (previous) (diff)

comment:4 Changed 2 years ago by zzz

You have any suggestions on how to improve the log message, or an alternative message if "metaspace" is the problem?

something like this?

if (oom.getMessage().contains("Metaspace"))

log("Fix your Metaspace!!!")

comment:5 Changed 2 years ago by zzz

btw, case 2 in OP (start/stop of webapps) may be related to the way jetty webapp classloading works, or may be a byproduct of what we do for plugin webapps, where we try to make sure to get the new classes after a plugin is updated. Not sure if the webapp logic forces that for non-plugin webapps or not.

whatever the case, I wouldn't worry about it, as restarting webapps is very rare in normal operation.

comment:6 Changed 2 years ago by jogger

About webapps I agree, fixing this is only important if uptime > 1 year desired.

Other apps die silently on an OOM and leave it entirely up to the user to figure out what went wrong. We should acknowledge that most users run Java 11 as standard part of their OS now and spit out errors accordingly. There are many memory areas Java deals with now, so erors messages should reflect that. From my experience I had heap OOMs only through my own coding errors, while metaspace and stack overflow OOM were more common and should be reported as such. For direct memory shortage I have only seen performance issues so far. Code cache could also be considered.

We should not log "Fix your Metaspace!!!", but use neutral language there. After all people with memory restrictions will set memory parameters to allow some headroom above multiday maximums. If we bust those it may be our fault.

re comment 1: Class loader peaks are caused by hundreds of java.lang.invoke.LambdaForm?$MH/0x….. with number of loaded classes never returning to previous levels afterwards.

comment:7 Changed 2 years ago by zzz

I don't have any data on Java versions in our userbase (do you?), but I suspect that most of our users are on windows, and most of them are on Java 8.

"Fix your metaspace" was a placeholder, what's your suggestion for a real error message, and is contains("Metaspace") sufficient to detect when to display it?

comment:8 Changed 2 years ago by Zlatin Balevsky

Class loader peaks are caused by hundreds of java.lang.invoke.LambdaForm??$MH/0x….. with number of loaded classes never returning to previous levels afterwards.

Very strange. I get a few of those in my log as well, but not hundreds and they get unloaded at later time. I wonder if it could be an artifact of having jconsole attached. Can you correlate the timing of their loading with any other activity on the router? Actually, what is the router doing - is it just routing traffic or seeding torrents, or something else?

comment:9 Changed 2 years ago by jogger

There is no correlation to other activity. Occurs during unattended operation, also not related to any IP change.

comment:10 Changed 2 years ago by zzz

possibly related: #2471

comment:11 Changed 16 months ago by zzz

Sensitive: unset
Status: newinfoneeded_new

This whole ticket sounds very JVM-specific. Is there anything to be done here or can we close the ticket?

comment:12 Changed 16 months ago by jogger

Status: infoneeded_newnew

I agree. It is very JVM-specific, the metaspace allocated differs between JVM versions. The error does only occur, when a limit on metaspace is set, and will not be noticed by most users since it is in the 10-20 MB area over long time. So probably not worth putting more effort in.

comment:13 Changed 16 months ago by zzz

Milestone: undecidedn/a
Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.