Opened 3 weeks ago

Last modified 7 days ago

#2646 new defect

UDP Transport: Drop peers faster

Reported by: jogger Owned by: zzz
Priority: minor Milestone: undecided
Component: router/transport Version: 0.9.43
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

There are frequent overruns of the 512 message _outboundqueue with 100s of drops in the message pool. failed(OutboundMessageState? msg, boolean allowPeerFailure) drops a peer after 50-60 sec, which is way too long and possibly causes > 1000 messages to die.

I suggest using (peer.getLastSendTime() - peer.getLastSendFullyTime() ⇐ 10 * 1000) for the timeout, which is longer than the message timeout. It strikes when no ACK came in for at least 10 sec, which is longer than an Rpi needs to dump heap onto SD card.

Subtickets

Change History (1)

comment:1 Changed 7 days ago by zzz

Depends why the peer stopped responding. If it's gone for good, then the messages are going to die anyway. If it's CPU locked in a heap dump, then maybe it will come back. If the SSU session is somehow hosed and starting a new one will fix it, maybe? But that's a different problem to fix. If we drop the peer, we'll just try to reconnect, which is probably more expensive.

_outboundQueue is CoDel? so we already have AQM on it.

TL;DR we can't prevent messages from "dying". And these probably are somebody else's tunnel messages anyway. They can only go one place, we can't reroute them.

Note: See TracTickets for help on using tickets.