Opened 9 years ago
Closed 9 years ago
#664 closed defect (fixed)
UDPReceiver.MAX_QUEUE_PERIOD check can fail sporadically
Reported by: | Zlatin Balevsky | Owned by: | zzz |
---|---|---|---|
Priority: | minor | Milestone: | 0.9.2 |
Component: | router/transport | Version: | 0.9 |
Keywords: | SSU | Cc: | |
Parent Tickets: | Sensitive: | no |
Description
If the fix to ticket #663 is applied, the check in UDPReceiver.run() that makes sure that the oldest packet in the receive queue is not older than MAX_QUEUE_PERIOD can fail. This is the execution sequence in question:
- UDPReceiver thread : UDPPacket head = _inboundQueue.peek() ; packet is still in queue!
- UDPReceiver thread : context-switched, loses cpu for some reason
- PacketHandler? thread : receiveNext(); now the same packet is removed from queue
- PacketHandler? thread : process packet, then release()
- UDPReceiver thread : gets cpu back, proceeds to call head.getLifetime() → that calls verifyNotReleased() but packet is released!
Subtickets
Change History (5)
comment:1 Changed 9 years ago by
comment:2 Changed 9 years ago by
Milestone: | → 0.9.2 |
---|
Seems like we could just remove verifyNotReleased() from UDPPacket.getLifetime() and that's the end of it?
comment:3 Changed 9 years ago by
Removing the check will work most of the time but still suffers from the problem where a stack-local reference to a packet that has already been processed and released is used to make a decision whether the router is overloaded or not, which in turn affects the decision whether to drop an incoming packet. My personal preference is to fix the underlying problem.
comment:4 Changed 9 years ago by
Disagree. We look to see the current queue delay, then use that to make a decision whether to drop a packet. We have to do one then the other. It doesn't have to be perfect. Sure, that could have been the only packet in the queue and there's nothing behind it, but more likely there are others with similar delay. But even that doesn't matter. Delay estimates / drop strategies are by design simple, fast, and a little sloppy.
There may or may not be an "underlying problem" with locking and this whole verifyNotReleased() ugliness. In fact I'm almost sure there is. But I don't think the discard code cares, it's not the reason or the place to fix it.
comment:5 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
verifyNotReleased() removed from getLifetime() in 0.9.1-2. Does not include any of the other concurrency suggestions above. Closing this ticket as some of the same concurrency issues are discussed in #660.
KillYourTV managed to reproduce this scenario after 28 minutes of uptime. His report is here http://pastethis.i2p/show/1467/
Unfortunately there is no easy way to solve this problem without losing some concurrency and moving away from BlockingQueues? to plain old LinkedLists? (and if the project decides to move to java 1.6, the much more efficient java.util.ArrayDeque? collection)
The suggestion below conflicts with ticket #660 so I'm only offering it as a an example. I'll post a separate suggestion on that ticket that is compatible with the other changes.
Also available with syntax highlighting at http://pastethis.i2p/show/1470/