Opened 6 years ago

Closed 5 years ago

#1134 closed defect (fixed)

Rate-limit outbound connections per-OBEP

Reported by: zzz Owned by: zzz
Priority: minor Milestone: 0.9.18
Component: router/general Version: 0.9.8.1
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

A badly behaved or malicious client can spray messages all over and force the OBEP to make a ton of connections. This will hit the router's connection limits and effectively DOS its other tunnels. This does actually happen on today's network. I've long suspected that iMule does this (DHT on its UDP tunnels?) but haven't proven it.

Fix this in OutboundMessageDistributor?. Check ctx.commSystem.isEstablished() for each message, keep an ObjectCounter? or Set per-hash, and keep some sort of moving average for new connections. Drop the msg if the limit is exceeded. This is all per-OMD, i.e. OBEP. The limit must be related to the conn limits, and perhaps the current conn capacity.

There's no good way to do this for inbound conns to an IBGW.

Subtickets

Attachments (1)

activepeers.png (14.4 KB) - added by zzz 6 years ago.
graph of active peers, note 10-minute bursts

Download all attachments as: .zip

Change History (7)

Changed 6 years ago by zzz

Attachment: activepeers.png added

graph of active peers, note 10-minute bursts

comment:1 Changed 6 years ago by zzz

Probably the right place to do this is OutboundTunnelEndpoint?.DefragmentedHandler?, since OMD is also used for a zero-hop OBGW+EP.

Perhaps rather than a ObjectCounter? we just keep a Set. When newly added, then check ifIsEstablished. If not, increment a counter.

But what to do with the counter. Perhaps some sort of exponential acceleration-limiting like in RouterThrottleImpl?. A bucketed counter probably isn't fine-grained enough. Need true rate measurement.

comment:2 Changed 6 years ago by zzz

Can we afford to keep a Set or Map per-OBEP? Could be thousands.

comment:3 Changed 6 years ago by zzz

Owner: set to zzz
Status: newaccepted

comment:4 Changed 6 years ago by zzz

In 0.9.11-12 5cf3b75b96e6da67234c4ab1f6f43d0debf62ec4 with a relatively high limit (20 per 30 seconds) which is still getting hit.

It appears this is netdb lookup traffic. I slowed down netdb RefreshRouters? from one every two secs to one every three secs. This is what runs just after startup. The other problem is at 60 minutes after startup. It used to go query all the expiring RIs again. I think I fixed that a couple releases ago.

As these queries are over expl. tunnels it's killing lots of slow routers so we need to fix the source, not just drop.

To be researched further.

comment:5 Changed 5 years ago by zzz

Milestone: 0.9.120.9.18

Closing this ticket as the OBEP side is fixed or at least there's code to fix it. In 0.9.13 I increased the limit from 20 to 60 per period in OutboundMessageDistributor? as it seemed like it was still too low and was causing problems. If the traffic is 'legitimate', whatever that means, then dropping it just causes trouble.

My current theory as to root cause is the RefreshRouter? job as described in comment 4 above. I'm not done trying to mitigate it. In 0.9.16 I increased the min age of router to be refreshed to reduce the number of refreshes. For 0.9.18 I've implemented a change to temporarily increase the number of exploratory tunnels in the first half hour of uptime to spread the load.

After .18 is out we can look at reducing the limit at OMD again.

comment:6 Changed 5 years ago by zzz

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.