Opened 6 years ago

Closed 6 years ago

#1183 closed defect (fixed)

Various transport-related OOMs

Reported by: Zlatin Balevsky Owned by: Zlatin Balevsky
Priority: minor Milestone: 0.9.11
Component: router/transport Version: 0.9.10
Keywords: OOM NTCP SSU Cc: zab@…, zzz@…
Parent Tickets: Sensitive: no

Description

My router has been dying with OOM frequently. According to the snapshot at the time of OOM, 66MB of RAM were used by OutNetMessage objects in the "_outbound" PriBlockingQueue inside NTCPConnection.

In order to avoid this the capacity of that queue should be bounded. Other changes may also be necessary as this can turn out to be a symptom of another problem.

Memory snapshot in .hprof format: http://nbafezf573rdcojay23xq67js3hpb6tz2hfh2ohcafbukq6rhvga.b32.i2p/zab.snap.hprof.bz2

Subtickets

Attachments (1)

reduced.png (17.8 KB) - added by dg 6 years ago.
Memory usage decreasing after upgrade to 0.9.10-3

Download all attachments as: .zip

Change History (14)

comment:1 Changed 6 years ago by Zlatin Balevsky

425 NTCPConnection objects, even though the limit is 250
5164 OutNetMessage objects
1902 NTCPConnection$PrepBuffer

I'm thinking I should move the bufferedPrepare logic in the Writer thread so we'll only ever need one PrepBuffer object per Writer thread ( == 4 ) I.e. move the logic of NTCPConnection.bufferedPrepare inside NTCPConnection.prepareNextWriteFast

comment:2 Changed 6 years ago by Zlatin Balevsky

Implementation of the above comment: http://pastethis.i2p/show/6541/

comment:3 in reply to:  2 Changed 6 years ago by Zlatin Balevsky

Replying to zab:
Alternative implementation w/o ThreadLocal http://pastethis.i2p/show/6542/

comment:4 Changed 6 years ago by Zlatin Balevsky

The non-ThreadLocal version committed to trunk with revision f335fa4635fb17949746d6d1e27964cf91328c3d

comment:5 Changed 6 years ago by Zlatin Balevsky

Status: newtesting

comment:6 in reply to:  4 ; Changed 6 years ago by Zlatin Balevsky

comment:7 Changed 6 years ago by Zlatin Balevsky

Status: testingneeds_work

With the PrepBuffer fix in 0.9.10-1 I'm seeing almost 9000 OutNetMessage objects. It will be necessary to limit the capacity of the "_outbound" queues.

comment:8 Changed 6 years ago by Zlatin Balevsky

Keywords: SSU added
Owner: changed from zzz to Zlatin Balevsky
Status: needs_workaccepted
Summary: Outbound message queue should be boundedVarious transport-related OOMs

Further investigations points to the SSU code. There were

7718 OutNetMessage objects retained by
1194 PeerState objects

The capacity of the PeerState._outboundQueue is currently not bounded.

comment:9 Changed 6 years ago by zzz

re: OP and comment 8, PriBlockingQueue? is indeed bounded, with a max size of 512 and a "backlogged" indication at 256. This is intended as a failsafe max. Perhaps this is too large, or alternatively a lower threshold may need to be checked for in SSU.

comment:10 Changed 6 years ago by Zlatin Balevsky

Moving the serialization of I2NPMessages later in the SSU sending process http://pastethis.i2p/show/6549/

comment:11 Changed 6 years ago by Zlatin Balevsky

Status: acceptedtesting

Released delayed serialization of SSU messages in revision adc5102c93383e01c74b87f04449dc9c307f6e75

comment:12 in reply to:  6 Changed 6 years ago by dg

Replying to zab:

Make that revision 1974b6a0247c431d14c41f4f1320d4f22f52f6bc

Attaching screenshot on 0.9.10-3 (running with this fix). Upgrading now to -6, not sure if there'll be such a big difference.

Changed 6 years ago by dg

Attachment: reduced.png added

Memory usage decreasing after upgrade to 0.9.10-3

comment:13 Changed 6 years ago by Zlatin Balevsky

Resolution: fixed
Status: testingclosed
Note: See TracTickets for help on using tickets.