Opened 7 months ago

Last modified 6 months ago

#2381 accepted enhancement

Make connection limits soft goals

Reported by: jogger Owned by: zzz
Priority: minor Milestone: 0.9.39
Component: router/transport Version: 0.9.37
Keywords: Cc:
Parent Tickets: #2355 Sensitive: no

Description

Walking through the stacks at high load showed that lots of time is spent within the countActivePeers function. The function loops over all conns although the exact nummber of conns is unimportant for router function. Its only purpose is to exactly meet conn limits.

If conn limits were soft goals then countActivePeers could simply return a previously saved value and run the actual loop only every 10s to update.

Subtickets

Change History (10)

comment:1 Changed 7 months ago by jogger

Component: unspecifiedrouter/general
Type: defectenhancement

comment:2 Changed 7 months ago by Zlatin Balevsky

Are you still seeing countActivePeers in the stacks even after the clock.now() removal? Can you send us an example?

comment:3 Changed 7 months ago by jogger

after the clock.now() removal performance was up, but all CPU related issues remained (at a slightly lower level). So I did 3 weeks of detailed cpu logging. Came up with patches that look VERY promising up to now but need one week of further monitoring before reporting success. Detailed story to follow. You might consider them for 0.9.38 though.

1.) I removed all references to !allowConnection, roughly equivalent to implementing this ticket
2.) I changed isUnreachable() and wasUnreachable() to call clock.now() only when needed.

Remarks for countActivePeers():

  • The UDP implementation is slightly more efficient.
  • One could dump the check for message counts if times would be initialized with 0 instead of clock.now()

Trace - same seen frequently for Tunnel GW Pumper as well:

appnet.i2p.router.transport.ntcp.NTCPTransport.countActivePeers(NTCPTransport.java:628)
app
net.i2p.router.transport.ntcp.NTCPTransport.allowConnection(NTCPTransport.java:533)
appnet.i2p.router.transport.ntcp.NTCPTransport.bid(NTCPTransport.java:465)
app
net.i2p.router.transport.TransportManager?.getNextBid(TransportManager?.java:701)
appnet.i2p.router.transport.GetBidsJob?.getBids(GetBidsJob?.java:66)
app
net.i2p.router.transport.CommSystemFacadeImpl?.processMessage(CommSystemFacadeImpl?.java:153)
appnet.i2p.router.OutNetMessagePool?.add(OutNetMessagePool?.java:49)
app
net.i2p.router.tunnel.OutboundMessageDistributor?.distribute(OutboundMessageDistributor?.java:123)
appnet.i2p.router.tunnel.OutboundMessageDistributor?.distribute(OutboundMessageDistributor?.java:71)
app
net.i2p.router.tunnel.OutboundTunnelEndpoint?$DefragmentedHandler?.receiveComplete(OutboundTunnelEndpoint?.java:75)
appnet.i2p.router.tunnel.FragmentHandler?.receiveComplete(FragmentHandler?.java:488)
app
net.i2p.router.tunnel.FragmentHandler?.receiveSubsequentFragment(FragmentHandler?.java:443)

appnet.i2p.router.tunnel.FragmentHandler?.receiveFragment(FragmentHandler?.java:289)
app
net.i2p.router.tunnel.FragmentHandler?.receiveTunnelMessage(FragmentHandler?.java:149)
appnet.i2p.router.tunnel.OutboundTunnelEndpoint?.dispatch(OutboundTunnelEndpoint?.java:45)
app
net.i2p.router.tunnel.TunnelDispatcher?.dispatch(TunnelDispatcher?.java:440)
appnet.i2p.router.InNetMessagePool?.doShortCircuitTunnelData(InNetMessagePool?.java:317)
app
net.i2p.router.InNetMessagePool?.shortCircuitTunnelData(InNetMessagePool?.java:302)
appnet.i2p.router.InNetMessagePool?.add(InNetMessagePool?.java:175)
app
net.i2p.router.transport.TransportManager?.messageReceived(TransportManager?.java:759)
appnet.i2p.router.transport.TransportImpl?.messageReceived(TransportImpl?.java:485)
app
net.i2p.router.transport.udp.UDPTransport.messageReceived(UDPTransport.java:1416)
appnet.i2p.router.transport.udp.MessageReceiver?.loop(MessageReceiver?.java:152)
app
net.i2p.router.transport.udp.MessageReceiver?$Runner.run(MessageReceiver?.java:74)
java.base@11.0.1/java.lang.Thread.run(Thread.java:834)
appnet.i2p.util.I2PThread.run(I2PThread.java:103)

comment:4 Changed 7 months ago by Zlatin Balevsky

Well 38 is due next week so it's too late for it, but I'm somewhat inclined to consider a soft goal for NTCPTransport::countActivePeers, but it would be updated at least twice a second.

Curious to see your patches!

comment:5 Changed 7 months ago by Zlatin Balevsky

Component: router/generalrouter/transport
Owner: set to zzz

comment:6 Changed 7 months ago by Zlatin Balevsky

Here is a quick-and-dirty of what I have in mind: http://paste.crypthost.i2p/?8dd5d0affeabddb1#XnW2A0kAtvyF35jNMQazmdxazpS/gsnjvj1gJpxAtlA=

@jogger: please see how that affects your CPU usage and throughput.

comment:7 Changed 7 months ago by jogger

@zab: That will be a welcome improvement. Cool: eliminates most of clock.now(). Can´t test soon because I have only bandwidth for one fully loaded router that I am currently testing with patches I hope will completely resolve #2355.

comment:8 Changed 7 months ago by zzz

Milestone: undecided0.9.39

haven't looked yet but will put it on the list to look at for 39

comment:9 Changed 6 months ago by zzz

Parent Tickets: 2355

comment:10 Changed 6 months ago by zzz

Status: newaccepted

Well, for starters, let's make NTCP allowConnection() the same as SSU. Both SSU and NTCP max idle timeout is higher than the threshold for "active". But if we're under conn limit pressure, the idle timeout will already have slewed far below the 5 minute threshold, so the total peers == the active peers.

Seems to me like countActivePeers() should only be for the UI, and that internally we should be using just the raw peer number, or else just make countActivePeers() == the raw number. This is from 15 years ago when the idle timeouts were fixed at one hour. With the current conn limit and slewed timeout regime, perhaps the "active" concept is pointless.

NTCP change in 08efbe159a9e17571c307f61c350f62bf1a081a0 to be 0.9.38-7

Leaving open for testing, and investigation of other places where countActivePeers() may be called other than the UI.

I don't believe the "soft goals" approached proposed in the OP and implemented in the link in comment 6 will be necessary.

re: comment 3 item 1, we obviously need to keep allowConnection() to enforce conn limits.

re: comemnt 3 item 2, you entered #2382 for that.

Note: See TracTickets for help on using tickets.