Opened 4 months ago

Closed 6 weeks ago

#2568 closed defect (fixed)

Massive amount of simultaneously open threads on FreeBSD after some time

Reported by: kotuh Owned by:
Priority: major Milestone: 0.9.42
Component: apps/i2ptunnel Version: 0.9.41
Keywords: bsd Cc:
Parent Tickets: Sensitive: no

Description

i2p creates a lot of threads after some time(right now I have THR 7080 after 6 day uptime according to top command) on FreeBSD 12.0 amd64 and eventually throws this error and prevent i2p from being used to browse hidden sites:

2019/07/10 17:03:20 | ERROR: Thread could not be started: StreamForwarder? 1107613.toI2P
2019/07/10 17:03:20 | Check ulimit -u, /etc/security/limits.conf, or /proc/sys/kernel/threads-max
2019/07/10 17:03:20 | java.lang.OutOfMemoryError?: unable to create new native thread
2019/07/10 17:03:20 | at java.lang.Thread.start0(Native Method)
2019/07/10 17:03:20 | at java.lang.Thread.start(Thread.java:717)
2019/07/10 17:03:20 | at net.i2p.util.I2PThread.start(I2PThread.java:86)
2019/07/10 17:03:20 | at net.i2p.i2ptunnel.I2PTunnelRunner.run(I2PTunnelRunner.java:310)
2019/07/10 17:03:20 | at net.i2p.i2ptunnel.I2PTunnelHTTPClient.clientConnectionRun(I2PTunnelHTTPClient.java:1350)
2019/07/10 17:03:20 | at net.i2p.i2ptunnel.I2PTunnelClientBase$BlockingRunner?.run(I2PTunnelClientBase.java:830)
2019/07/10 17:03:20 | at java.util.concurrent.ThreadPoolExecutor?.runWorker(ThreadPoolExecutor?.java:1149)
2019/07/10 17:03:20 | at java.util.concurrent.ThreadPoolExecutor?$Worker.run(ThreadPoolExecutor?.java:624)
2019/07/10 17:03:20 | at java.lang.Thread.run(Thread.java:748)

This is not an ulimit problem because I already have really high limit (>900000) and probably not threads-max (~8000). This is not an 0.9.41 specific problem, I had it before, but sometimes I could leave it for 3-4 week before I had to restart router.
Unfortunately I have no idea what part of i2p creates so much threads.

Router info:
I2P version: 0.9.41-0
Java version: Oracle Corporation 1.8.0_212 (OpenJDK Runtime Environment 1.8.0_212-b04)
Wrapper version: 3.5.37
Server version: 9.2.25.v20180606
Servlet version: Jasper JSP 2.3 Engine
JSTL version: standard-taglib 1.2.0
Platform: FreeBSD amd64 12.0-RELEASE-p6
Processor: Haswell Core i3/i5/i7 model 60 (coreihwl)
JBigI status: Locally optimized library libjbigi-freebsd-coreihwl_64.so loaded from file
GMP version: 6.1.2
JBigI version: 4
JCpuId version: 3
Encoding: UTF-8
Charset: UTF-8
Built By: zzz

Subtickets

Change History (21)

comment:1 Changed 4 months ago by kotuh

Keywords: bsd added

comment:2 Changed 4 months ago by zzz

Priority: minormajor

Sometime after the thread leaks have started, but before you're getting errors, please take a thread dump (from /configservice in the console, or i2prouter dump from the command line). It may stretch across more than one wrapper.log.* file. Pull it out of the (multiple) files, gzip it and attach it here or email it to me zzz@…

comment:3 in reply to:  2 Changed 4 months ago by kotuh

Replying to zzz:

Sometime after the thread leaks have started, but before you're getting errors, please take a thread dump (from /configservice in the console, or i2prouter dump from the command line). It may stretch across more than one wrapper.log.* file. Pull it out of the (multiple) files, gzip it and attach it here or email it to me zzz@…

Here, this is log with THR around 7000 and without errors.

comment:4 Changed 4 months ago by zzz

Got it. Deleted attachment for privacy. The threads are in i2ptunnel, forwarding from a socket to the far-end and vice-versa. You apparently have a popular site. The local sockets to the server aren't getting closed. Take a look at your web server software or how you're connecting to the local server. I don't know why things aren't timing out on the socket side. The i2p side won't timeout.

Here's one example in each direction from the thread dump:

2019/07/21 19:56:52 | "I2PTunnel Client Runner 1163" #30502 daemon prio=5 os_prio=15 tid=0x0000000970f0c000 nid=0x18963 in Object.wait() [0x00007fffc
1e1d000]
2019/07/21 19:56:52 |    java.lang.Thread.State: WAITING (on object monitor)
2019/07/21 19:56:52 | 	at java.lang.Object.wait(Native Method)
2019/07/21 19:56:52 | 	at java.lang.Object.wait(Object.java:502)
2019/07/21 19:56:52 | 	at net.i2p.client.streaming.impl.MessageInputStream.read(MessageInputStream.java:432)
2019/07/21 19:56:52 | 	- locked <0x0000000814a2c020> (a java.lang.Object)
2019/07/21 19:56:52 | 	at net.i2p.client.streaming.impl.MessageInputStream.read(MessageInputStream.java:391)
2019/07/21 19:56:52 | 	at net.i2p.i2ptunnel.I2PTunnelRunner$StreamForwarder.run(I2PTunnelRunner.java:521)
2019/07/21 19:56:52 | 	at net.i2p.i2ptunnel.I2PTunnelRunner.run(I2PTunnelRunner.java:313)
2019/07/21 19:56:52 | 	at net.i2p.i2ptunnel.I2PTunnelHTTPClient.clientConnectionRun(I2PTunnelHTTPClient.java:1350)
2019/07/21 19:56:52 | 	at net.i2p.i2ptunnel.I2PTunnelClientBase$BlockingRunner.run(I2PTunnelClientBase.java:830)
2019/07/21 19:56:52 | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2019/07/21 19:56:52 | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2019/07/21 19:56:52 | 	at java.lang.Thread.run(Thread.java:748)


2019/07/21 19:56:51 | "StreamForwarder 758021.toI2P" #2263421 daemon prio=5 os_prio=15 tid=0x0000000987472000 nid=0x18fe3 runnable [0x00007ffe5b6cc00
0]
2019/07/21 19:56:51 |    java.lang.Thread.State: RUNNABLE
2019/07/21 19:56:51 | 	at java.net.SocketInputStream.socketRead0(Native Method)
2019/07/21 19:56:51 | 	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
2019/07/21 19:56:51 | 	at java.net.SocketInputStream.read(SocketInputStream.java:171)
2019/07/21 19:56:51 | 	at java.net.SocketInputStream.read(SocketInputStream.java:141)
2019/07/21 19:56:51 | 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
2019/07/21 19:56:51 | 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
2019/07/21 19:56:51 | 	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
2019/07/21 19:56:51 | 	- locked <0x00000008bd44d0d0> (a java.io.BufferedInputStream)
2019/07/21 19:56:51 | 	at java.io.FilterInputStream.read(FilterInputStream.java:107)
2019/07/21 19:56:51 | 	at net.i2p.i2ptunnel.I2PTunnelRunner$StreamForwarder.run(I2PTunnelRunner.java:521)

comment:5 Changed 4 months ago by kotuh

Yes, I have a hiddensite (not that popular though). BUT!

  1. I just use an ordinary HTTP hidden service with ordinary web-server on localhost (timeout 60 sec and same keepalive timeout). Restarting (stop + start) web-server gave no result.
  2. I had this issue before running hidden service, just an ordinary floodfill router. But I had to restart it every 20-30 days.

I just remembered that I use reverse-proxy to access I2P router console, can this be a problem? It's used only by me and not that often.

comment:6 Changed 4 months ago by zzz

Definitely not the router console. This is about threads used to pass data between the i2p network and your web server. If stopping and restarting the web server isn't killing the threads, then we probably need to look at Java. Since we don't have reports on this from others, I'm wondering if it's a FreeBSD thing. Is there a different or newer Java available for FreeBSD - perhaps Java 11 - that you can try?

comment:7 in reply to:  6 Changed 4 months ago by kotuh

Restarting web-server gave no result. And yes, I can install openjdk11 and openjdk12.
I am sure it's FreeBSD specific, but I don't know if it's fault of FreeBSD version of openjdk or even FreeBSD itself.
I will try to install openjdk11 and run a router for a while but maybe it's possible to make some workaround (like find all hanged threads and kill them) in case if new openjdk will have same issue?

It will be good to hear if anyone else have this issue on FreeBSD.

comment:8 Changed 4 months ago by zzz

Sure, there are workarounds possible. The reason we don't have a read timeout is that there could be a huge POST the other direction. So, if somebody is uploading a huge file in a POST and it takes 3 hours, there would be nothing going the other way for those 3 hours, so we couldn't have a 5 minute timeout. If it's a websocket or https connection (CONNECT), we have no idea what to do. We could set a 3 hour timeout, or maybe set it shorter unless it's a POST, or on timeout check for activity the other direction… we could do it only for FreeBSD, but maybe this is happening on other platforms also, we just don't know it… So I don't know enough yet to know what to do about it.

comment:9 Changed 4 months ago by kotuh

openjdk 11.0.3 - died on 3rd day with ~7800 threads. Can't even browse eepsites + I got "Accepting most tunnels" in router console.
What should I try next? Maybe I can enable additional debug info or something like that?
Or maybe it's possible to make an experimental 24h timeout?
Btw dying time differs from one version to another. I had an 1 month uptime with 0.9.40.

Stopped all eepsites/client tunnels and voila! Less than 200 threads again! I will try to find the culprit.

Last edited 4 months ago by kotuh (previous) (diff)

comment:10 Changed 4 months ago by zzz

Thanks for trying Java 11, it was worth a try.
You said it was an 'ordinary web server' but you didn't say which? nginx/apache/? If you don't want to say publicly you can email me.
Is it possible to play with the timeout settings on the web server? You say there's a 'keepalive' timeout of 60 seconds, what's that for, and is it required?
Can you try a different web server?
Is it possible you're being DDoSed? Check your router and web server logs… Check your settings for connection limits in the hidden services manager
Do the threads get killed if you stop and restart the hidden service only in the hidden services manager?

comment:11 Changed 4 months ago by kotuh

I think I got it. Maybe, not 100% sure yet. Right now router running for 20 hours with openjdk 8 and 124 threads. eepsite enabled, 2500/4600 active peers, 1200 participating tunnels, ~300KBps.
I used to connect to my i2prouter to browse eepsites AND my own website as well but right now I just disabled HTTP and HTTPS proxy and accessing my eepsite through another router. I am also noticed that when I use my router accessing my eepsite was like accessing website on localhost, everything loading fast and without problems.
This is just a theory, but it fits well in idea of hanged threads, that connects somewhere to localhost. I mean that when I am loading my eepsite i2p-router connects to local webserver and just don't kill threads for some reason.
I will run my router for a week to make sure this is not a coincidence

Web-server doesn't really matter, I tried Apache, Nginx, Lighttpd, same problem.
Yes, I tried to decrease timeout (drop connection if it's active for too long) and no result. No, it's working as it should i.e. if I try to upload big file it's just drops connection, but it doesn't help with threads problem.
Keepalive is "idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair." It's not actually required.
No, I am not DDoSed, accesslog contains only legitimate records (real users), server load is minimal, connection limit is set to 150 req per minute.
Threads were killed when I stopped both hidden service and all client tunnels. Right not I am running hidden service and all client tunnels except HTTP(S) proxy.

comment:12 Changed 4 months ago by zzz

OK. I think what you are saying is that you had some sort of proxy configured, but you disabled it, and that fixed it?

I do have some workarounds coded, similar to what I described in comment 8 above. A 5 minute socket read timeout for GET/HEAD and 3 hours for POST/CONNECT. I haven't tested it yet but will soon.

comment:13 Changed 4 months ago by kotuh

That proxy was for I2P Router Console (ordinary reverse-proxy to open Console in browser), right now I am talking about I2P HTTP Proxy which is used to visit eepsites.

So the problem appears when I am using i2p router for visiting my eepsite that hosted on the same router as I2P HTTP Proxy. Then somehow amount of threads increases drastically over time.
Right now I have I2P HTTP Proxy disabled and everything works fine, same ~120 threads after 2 day uptime.

Let's pretend we have router A on some remote server which is used as I2P HTTP Proxy to visit eepsites. Then someday I created an eepsite X on this router A and try to visit it.
What will happen when I will try to access this eepsite X? Will it just open it as localhost website? Or will it make some i2p tunnel magic?
When eepsite hosted on the same router it opens much faster than other eepsites but still not as fast as opening it in clearnet + this strange threads issue appears. Any ideas?

comment:14 Changed 4 months ago by zzz

Component: unspecifiedapps/i2ptunnel

OK, I get it. The problem occurs when you are accessing your site yourself, through the local proxy, via what we call "loopback" connections. So that raises two possibilities:

  • The problem isn't on the HTTP server side, as I had been assuming, but on the HTTP client (proxy) side
  • The problem is unique to "loopback" connections

Either way, I was probably looking in the wrong place for the problem. I'll keep researching. I can't tell from the logs which side is the issue. I assume you're just using a regular browser, with foxyproxy or something? Any chance that your choice of browser or proxy plugin is causing the issue?

comment:15 Changed 4 months ago by kotuh

Yes, Firefox ESR with built-in proxy functionality, no addons. Tried different OSes as client and same thing everywhere.
But why this is happening on FreeBSD only? Or maybe there is same problem on Linux but it handles it somehow? Like no thread limit per process or something.

comment:16 Changed 4 months ago by zzz

can you see what state the sockets are in with 'netstat' or similar tool? Does the OS see them as alive still?

I don't have any idea what the root cause is or why it is apparently happening only on FreeBSD. Googling hasn't turned up anything relevant.

I am testing the failsafe timeouts mentioned in comment 12, and I will check those in before our next release if we don't come up with anything better.

I haven't tried to reproduce it with "loopback" connections on linux yet, but it's on my todo list.

comment:17 Changed 4 months ago by zzz

On linux, I set up a local server on "loopback", accessed it through the proxy, hit refresh 50 times, then did a thread dump, and I didn't see any threads stuck in socket read.

comment:18 Changed 4 months ago by zzz

Added failsafe timeouts in 3109b17f73b6fbbf5ae5fef4853ae2654eb31f2d 0.9.41-7

comment:19 Changed 3 months ago by zzz

Status: newinfoneeded_new

@OP please report results

comment:20 Changed 3 months ago by kotuh

Status: infoneeded_newnew

Sorry for disappearing.
I will check sockets in netstat in this week (and failsafe timeouts too). I am also got a Linux VPS to check if it affects Linux too.

comment:21 Changed 6 weeks ago by zzz

Milestone: eventually0.9.42
Resolution: fixed
Status: newclosed

No response, presumed fixed

Note: See TracTickets for help on using tickets.