Opened 5 years ago

Closed 4 months ago

#1506 closed defect (fixed)

Website takedown possible due to client tunnel waiting for incomplete upload completition

Reported by: DjJeshk Owned by:
Priority: major Milestone: 0.9.33
Component: apps/i2ptunnel Version: 0.9.18
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

Steps to reproduce this:

  1. Install new apache server and configure it to listen on some localhost port.
  1. Create new I2P Hidden Service that points to apache port
  1. Place one small file and one gigantic file
  1. Access apache server through i2p router and download small file.
  1. Find apache access logs and find small request to verify access log is working.
  1. Start downloading gigantic file and terminate download after you start receiving it. Repeat this some times.
  1. Try to find in apache access logs gigantic file requests. If you cant find these requests, then i2p router is still keeping socket open on apache server.
  1. Stop newly created I2P Hidden Service.
  1. Look at apache logs again. If you did not saw gigatic file requests at step 7, then you will see them.

Router info:

I2P version: 0.9.18-15-rc
Java version: Oracle Corporation 1.8.0_31 (Java™ SE Runtime Environment 1.8.0_31-b13)
Wrapper version: 3.5.25
Server version: 8.1.16.v20140903
Servlet version: Jasper JSP 2.1 Engine
Platform: Windows XP x86 5.1
Processor: Core 2 (45nm) (core2)
Jbigi: Locally optimized native BigInteger? library loaded from file
Encoding: Cp1257
Charset: windows-1257

You might see something like this in router logs:

15.3.4 13:48:14 ERROR [7.0.0.1:xxxxx] .i2ptunnel.I2PTunnelHTTPServer: Error accepting
     java.net.ConnectException: ServerSocket closed
     at net.i2p.client.streaming.impl.ConnectionHandler.accept(ConnectionHandler.java:138)
     at net.i2p.client.streaming.impl.I2PSocketManagerFull.receiveSocket(I2PSocketManagerFull.java:137)
     at net.i2p.client.streaming.impl.I2PServerSocketFull.accept(I2PServerSocketFull.java:38)
     at net.i2p.i2ptunnel.I2PTunnelServer.run(I2PTunnelServer.java:489)
     at java.lang.Thread.run(Unknown Source)
     at net.i2p.util.I2PThread.run(I2PThread.java:84)

Then you might have to reset router to let things work again.

Subtickets

Attachments (1)

LogfileFileThroughRouter.XML (963.6 KB) - added by DjJeshk 4 years ago.
Download started at 13:08:16,8562979, stopped at 13:10:04,0273461, but java keeps reading

Download all attachments as: .zip

Change History (23)

comment:1 Changed 5 years ago by DjJeshk

I forgot to mention how website takedown can be caused. Create script that downloads one byte from target server, interrupts connection, and repeats this until other status than 200 is received.

comment:2 Changed 5 years ago by DjJeshk

This causes movie streaming over HTTP to be unreliable and dangerous to router, because movie player will first request file without asking range, because it does not know file size. If user after that will request to see something at 20min:00sec, media player will terminate connection and request specific range. Server will respond correctly, but previous request is still waiting to something.
And worst scenario also happens, your router starts to eat all CPU.
Affected threads that wants eat all CPU:
2015/04/03 16:53:38 | "NTCP Pumper" #33667 daemon prio=6 os_prio=0 tid=0x4668d800 nid=0x3f0 waiting for monitor entry [0x45b0f000]
2015/04/03 16:53:38 | java.lang.Thread.State: BLOCKED (on object monitor)
2015/04/03 16:53:38 | at sun.nio.ch.WindowsSelectorImpl?$SubSelector?.processFDSet(Unknown Source)
2015/04/03 16:53:38 | - waiting to lock <0x1b34c200> (a java.lang.Object)
2015/04/03 16:53:38 | at sun.nio.ch.WindowsSelectorImpl?$SubSelector?.processSelectedKeys(Unknown Source)
2015/04/03 16:53:38 | at sun.nio.ch.WindowsSelectorImpl?$SubSelector?.access$2900(Unknown Source)
2015/04/03 16:53:38 | at sun.nio.ch.WindowsSelectorImpl?.updateSelectedKeys(Unknown Source)
2015/04/03 16:53:38 | at sun.nio.ch.WindowsSelectorImpl?.doSelect(Unknown Source)
2015/04/03 16:53:38 | at sun.nio.ch.SelectorImpl?.lockAndDoSelect(Unknown Source)
2015/04/03 16:53:38 | - locked <0x1b2efc88> (a sun.nio.ch.Util$2)
2015/04/03 16:53:38 | - locked <0x1b2efc98> (a java.util.Collections$UnmodifiableSet?)
2015/04/03 16:53:38 | - locked <0x1b2efc10> (a sun.nio.ch.WindowsSelectorImpl?)
2015/04/03 16:53:38 | at sun.nio.ch.SelectorImpl?.select(Unknown Source)
2015/04/03 16:53:38 | at net.i2p.router.transport.ntcp.EventPumper?.run(EventPumper?.java:184)
2015/04/03 16:53:38 | at java.lang.Thread.run(Unknown Source)
2015/04/03 16:53:38 | at net.i2p.util.I2PThread.run(I2PThread.java:84)

2015/04/03 16:53:38 | "I2CP Internal Reader 2467" #33709 daemon prio=5 os_prio=0 tid=0x4639dc00 nid=0xfec runnable [0x4800f000]
2015/04/03 16:53:38 | java.lang.Thread.State: WAITING (parking)
2015/04/03 16:53:38 | at sun.misc.Unsafe.park(Native Method)
2015/04/03 16:53:38 | - parking to wait for <0x1b2e9f30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer?$ConditionObject?)
2015/04/03 16:53:38 | at java.util.concurrent.locks.LockSupport?.park(Unknown Source)
2015/04/03 16:53:38 | at java.util.concurrent.locks.AbstractQueuedSynchronizer?$ConditionObject?.await(Unknown Source)
2015/04/03 16:53:38 | at java.util.concurrent.LinkedBlockingQueue?.take(Unknown Source)
2015/04/03 16:53:38 | at net.i2p.router.client.I2CPMessageQueueImpl.take(I2CPMessageQueueImpl.java:66)
2015/04/03 16:53:38 | at net.i2p.internal.QueuedI2CPMessageReader$QueuedI2CPMessageReaderRunner.run(QueuedI2CPMessageReader.java:51)
2015/04/03 16:53:38 | at java.lang.Thread.run(Unknown Source)
2015/04/03 16:53:38 | at net.i2p.util.I2PThread.run(I2PThread.java:84)

comment:3 Changed 5 years ago by DjJeshk

See full testing info at http://baltijateam.i2p/bugreport/1506/
PS. At image you can thread id's. Convert 'em to hex and search 'em in the thread dump. They will be found at nid=0xhexnumbers
PPS. Why I have lot of Snark related threads? Is large number of torrents is the cause? Two threads for each torrent?

Version 0, edited 5 years ago by DjJeshk (next)

comment:4 Changed 5 years ago by DjJeshk

Priority: majorcritical

comment:5 Changed 5 years ago by zzz

please specify "some times" (from item 6 in OP)

please confirm these issues are not due to a configured apache limit

comment:6 Changed 5 years ago by DjJeshk

Copy of http://baltijateam.i2p/bugreport/1506/apache%20server%20log%20that%20shows%2020%20minute%20lock%20due%20to%20this%20bug.txt

127.0.0.1 - - [03/Apr/2015:16:42:25 +0300] "GET /giganticfilemovie.mp4" 206 59 "-" "-"
127.0.0.1 - - [03/Apr/2015:16:42:38 +0300] "GET /giganticfilemovie.mp4" 206 1 "-" "-"
127.0.0.1 - - [03/Apr/2015:17:04:17 +0300] "GET / HTTP/1.1" 200 933 "-" "MYOB/6.66 (AN/ON)"
127.0.0.1 - - [03/Apr/2015:17:04:18 +0300] "GET /favicon.ico HTTP/1.1" 404 209 "-" "MYOB/6.66 (AN/ON)"
127.0.0.1 - - [03/Apr/2015:16:58:47 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 206 51791222 "-" "-"
127.0.0.1 - - [03/Apr/2015:16:43:03 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 206 116439725 "-" "-"
127.0.0.1 - - [03/Apr/2015:16:58:39 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 206 73458349 "-" "-"
127.0.0.1 - - [03/Apr/2015:16:40:40 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 200 350813661 "-" "MYOB/6.66 (AN/ON)"
127.0.0.1 - - [03/Apr/2015:16:41:16 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 200 350813661 "-" "MYOB/6.66 (AN/ON)"
127.0.0.1 - - [03/Apr/2015:16:56:33 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 206 217162443 "-" "-"
127.0.0.1 - - [03/Apr/2015:16:42:21 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 206 350813661 "-" "-"
127.0.0.1 - - [03/Apr/2015:16:42:39 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 206 348034252 "-" "-"
127.0.0.1 - - [03/Apr/2015:16:42:26 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 206 349424469 "-" "-"

This log shows timestamps at the moment when requests are made but sequence shows when requests were completed. So requests after

127.0.0.1 - - [03/Apr/2015:17:04:18 +0300] "GET /favicon.ico HTTP/1.1" 404 209 "-" "MYOB/6.66 (AN/ON)"

were only completed after 17:04:18

To reproduce step 6, I opened movie player with streaming support. Multiple requests were made from 16:42:25 to 16:58:47 by seeking movie, then player was closed and at 17:04:17 main page was requested.

Log entry description:

127.0.0.1 - - [03/Apr/2015:16:43:03 +0300] "GET /giganticfilemovie.mp4 HTTP/1.1" 206 116439725 "-" "-"

/giganticfilemovie.mp4 was requested at 16:43:03 as a partial request (206 status code) and 116439725 bytes was served.

I can cofirm these issues are not due to apache limits, because server responded correctly and only ~10 requests were made.

PS. at 16:40:40 and 16:41:16 there were requests made with browser as stated in original step 6. Downloads were started and interrupted after some seconds.

Last edited 5 years ago by DjJeshk (previous) (diff)

comment:7 Changed 5 years ago by DjJeshk

If we are not limiting requests that are made to server and server does not prevent creating thousands of connections at once, whole router crash is possible with all dependent services not just once website.
See issue that can be caused by this: http://trac.i2p2.i2p/ticket/1500 (Cannot start websites because of java connection limit)

comment:8 Changed 5 years ago by DjJeshk

Still an issue in 0.9.18-18-rc. I could reproduce this at your side if you create a test website that contains two small files that would be used to determine testing start and end times and one big file that would be used to generate incomplete downloads.

comment:9 Changed 5 years ago by DjJeshk

Cause are lot of DistributeLocal? Jobs in job queue after client disconnects from website. At one moment there was about 100 waiting jobs like

[waiting 63 ms]: DistributeLocal: Job 57231408: Distribute local message

comment:10 Changed 5 years ago by zzz

The "ServerSocket? closed" error in the OP is caused by you stopping the hidden service (step 8 in the OP). This is normal.

Of course you must restart the hidden service to "get things to work again".

So at least based on the OP report I don't see a bug here.

If you ever get that message without stopping the hidden service manually, then there's a bug. If so, then set logging net.i2p.client.streaming.impl.ConnectionHandler?=WARN and search for "I did it" which will identify how it was closed.

Your assertion in step 7 in the OP that if Apache didn't log anything, the socket is still open… maybe. I don't know when Apache logs and when it doesn't.

Your threads listed in comment 2 are not relevant - the first is a transport thread, the second is "parked", i.e. doing nothing.

I (briefly) tried reproducing by downloading a huge file locally (with a Jetty server) and didn't see any problem so far. Queued DistributeLocal? jobs are not necessarily an indication of a problem but worth taking a look at.

Note to devs: OP has started multiple threads discussing this ticket and related issues on zzz.i2p:
http://zzz.i2p/topics/1865
http://zzz.i2p/topics/1870
http://zzz.i2p/topics/1872

comment:11 Changed 5 years ago by DjJeshk

Priority: criticalmajor

I tested it with jetty and there are no problems. Looks like this only affects websites that are not running on same router java process. It is funny to say, but apache and other not jetty servers are too fast for router. So jetty at the moment is only safe option for hosting websites at I2P.

While I was writting this, there were DistributeLocal? jobs running after download interruption. So jetty is affected, but this does not harm to router like any external servers. Regular user visits might not harm so much as direct visits from website owner.

zzz: try out with external server (apache), jetty does not counts.

This is critical only if you perform direct connections on non jetty servers.

comment:12 Changed 5 years ago by zzz

Interesting. While Jetty is inside the JVM, i2ptunnel doesn't "know" that, it goes out to a socket and back in to get to Jetty. So any difference is Apache vs. Jetty, not external vs. internal.

comment:13 Changed 5 years ago by DjJeshk

While Jetty is on same JVM, another router threads simply blocks Jetty with CPU load. When server is in external process, router load have to block whole system to prevent server responses. So diference appears between external and internal server not because i2ptunnel knows something, but because of CPU load balancing.

Btw., DistributeLocal? jobs should appear when client is inactive?

comment:14 Changed 5 years ago by DjJeshk

Issue can be reproduced in

I2P version: 0.9.19-0
Java version: Oracle Corporation 1.8.0_40 (Java™ SE Runtime Environment 1.8.0_40-b26)
Wrapper version: 3.5.25
Server version: 8.1.16.v20140903
Servlet version: Jasper JSP 2.1 Engine
Platform: Windows 7 x86 6.1
Processor: Core 2 (45nm) (core2)
Jbigi: Locally optimized library jbigi-windows-core2.dll loaded from file
Encoding: Cp1257
Charset: windows-1257

I placed a movie in my server and played and seeked it through router. Apache server is taken down completely if movie is seeked and JVM is using 100% CPU due to DistributeLocal? jobs. Jetty starts to generate big lags if movie is seeked, but server stays alive. After client disconnection JVM is maxed at 100% CPU for some minutes again due to DistributeLocal? jobs.

comment:15 Changed 5 years ago by zzz

Milestone: undecided0.9.20
Status: newtesting

I cannot reproduce OP with Apache on Linux either. I see no hangs, I see no leftover DistributeLocal? jobs on the job queue. I never saw more than 4 DistributeLocal? jobs waiting either. Not that a job waiting for 68 ms (as shown in comment 9 above) is cause for concern. Never saw any 100% CPU load. So there's zero part of what you're reporting that shows up here.

However, it does make sense to keep a large number of these jobs off the job queue. We already had a note on the code to do this, so I implemented a change to run them inline, in b72c2fd26bdb8680d37ac36b6211c748f02dda3f 0.9.19-3.

This may or may not help you, as I don't understand anything about what you're seeing, can't reproduce it, and don't know if your theory about DistributeLocal? jobs causing the problem (and your other theories) hold any water at all.

But running the jobs inline is probably a good thing either way.

comment:16 Changed 4 years ago by zzz

HTTP decompressor fix in 270feedac80bb7c973d3ddb4148d76056e436bc7 to be 0.9.19-15 to close input stream on termination, e.g. control-C in eepget. This gets the close() back to streaming, but still doesn't send a reset to the other end, which is a problem. See I2PSocketFull and ConnectionPacketHandler?.

comment:17 Changed 4 years ago by DjJeshk

This issue now does not directly affect job lags but it does not resets connection as you said. If reset sending will be implemented over I2P, it is not guaranteed that this message will reach destination and connection will be closed. Old I2P versions will not use it at all it this will not be fixed. And in most cases client does not want to interrupt download and tunnel breaks.

You are not seeing this problem because your system is able to process more data than hard disk could provide. Single core CPU may not be so powerful to process everything at same speed as disk provides. Load that file on RAM (at your case it should be larger than 10GB) and test again.

comment:18 Changed 4 years ago by DjJeshk

Did file monitoring with https://technet.microsoft.com/en-us/sysinternals/bb896645 and can confirm that file keeps reading after download is interrupted.

Changed 4 years ago by DjJeshk

Download started at 13:08:16,8562979, stopped at 13:10:04,0273461, but java keeps reading

comment:19 Changed 4 years ago by DjJeshk

When file is downloaded directly from Jetty, connection reset is handled correctly and download speed is about 10 times bigger.

comment:20 Changed 4 years ago by DjJeshk

If you want faster I2P transfers, data needs to be routed in larger chunks as possible and 1KB limit should be removed. This applies to every command that manipulates with data. It will minimize impact of performance issues that are difficult to fix.

Last edited 4 years ago by DjJeshk (previous) (diff)

comment:21 Changed 4 years ago by zzz

Milestone: 0.9.20undecided
Status: testingneeds_work

Comments 17-20 above assert that there is more to do, so marking as needs-work

comment:22 Changed 4 months ago by zzz

Milestone: undecided0.9.33
Resolution: fixed
Sensitive: unset
Status: needs_workclosed

This is a very old ticket. I believe most of these issues were fixed in 0.9.33. See #2071

Note: See TracTickets for help on using tickets.