Opened 8 weeks ago

Last modified 7 weeks ago

#2707 new enhancement

Make TCP_KAPPA configurable

Reported by: Zlatin Balevsky Owned by:
Priority: minor Milestone: undecided
Component: streaming Version: 0.9.45
Keywords: testnet Cc:
Parent Tickets: Sensitive: no

Description

I'm seeing some amazing results tweaking TCP_KAPPA on the testnet - setting the value to 7 about triples throughput.

I only have a partial theory as to why this is happening, and it could well be specific to the testnet. But I would like to enable client apps and users to experiment with that value in the live net.

Subtickets

Attachments (5)

window-vanilla.csv (21.6 KB) - added by Zlatin Balevsky 8 weeks ago.
window metrics for vanilla (kappa == 4)
window-kappa7.csv (11.2 KB) - added by Zlatin Balevsky 8 weeks ago.
window metrics for kappa == 7
window-charts-k4-k7.ods (296.7 KB) - added by Zlatin Balevsky 8 weeks ago.
charts based on the CSV files
window-kappa8.csv (5.9 KB) - added by Zlatin Balevsky 8 weeks ago.
window metrics for kappa == 8
window-kappa9.csv (6.9 KB) - added by Zlatin Balevsky 8 weeks ago.
window metrics for kappa == 9

Download all attachments as: .zip

Change History (14)

Changed 8 weeks ago by Zlatin Balevsky

Attachment: window-vanilla.csv added

window metrics for vanilla (kappa == 4)

Changed 8 weeks ago by Zlatin Balevsky

Attachment: window-kappa7.csv added

window metrics for kappa == 7

Changed 8 weeks ago by Zlatin Balevsky

Attachment: window-charts-k4-k7.ods added

charts based on the CSV files

comment:1 Changed 8 weeks ago by Zlatin Balevsky

The attached files contain log extracts from two downloads of an 8MB file between two hosts, one running with vanilla (kappa == 4) and another where the only modification is kappa == 7

The vanilla download took 6 minutes. The kappa 7 download took 141 seconds.

The charts graph window size, rtt, rto and un-acked packets.

Changed 8 weeks ago by Zlatin Balevsky

Attachment: window-kappa8.csv added

window metrics for kappa == 8

Changed 8 weeks ago by Zlatin Balevsky

Attachment: window-kappa9.csv added

window metrics for kappa == 9

comment:2 Changed 8 weeks ago by Zlatin Balevsky

Two more experiments, with kappa == 8 and 9.

Same 8MB file transferred in 84 seconds with kappa == 8, and 86 seconds with kappa == 9. So far 8 looks like the optimal spot.

comment:3 Changed 8 weeks ago by zzz

Background: Kappa is "K" in RFC 6298 where it's specified as 4.
#2445 another ticket of yours where we were trying to ensure compliance with that RFC.
A/B/K used to be configurable, removed by you in July 2013, possibly at my suggestion.
ConnectionOptions?.java
This change will have the effect of increasing RTO and decreasing retransmissions, thus keeping the window size higher. This is pretty much my working theory of why streaming is slow - retransmissions knock the window size in half. At our level of retransmissions (5% or more), the window never has a chance. The Westwood paper lays it out.
Would you please describe the testnet parameters - what artificial delay/jitter/drops if any are added?
I would like to work on streaming as well, after ratchet wraps up. In particular I think some of the ideas from jogger's SSU patch (TCP Westwood) may apply well to streaming.
I'd like to research why K is 4 in 6298 and if there are any papers out there on other values.

comment:4 Changed 8 weeks ago by Zlatin Balevsky

I didn't see any increase in throughput in the live net, which leads me to think this is an artifact of the testnet configuration.

The testnet is configured with 70ms delay latency between each node. Jitter and distribution are whatever the defaults are set with the following command:

/sbin/tc qdisc add dev eth0 root netem delay 70ms

My theory as to why higher kappa affects the testnet so much is that the main reason packets get retransmitted there is due to late ACKs. Higher K means bigger RTO relative to the deviation and that RTO gives all the late ACKs chance to arrive.

That is clearly not the case in the live net.

comment:5 Changed 8 weeks ago by zzz

'man tc-netem' is the guide here. looks like default is zero jitter and zero loss. A realistic testnet might add a little jitter and loss, and what the values should be might be worth researching (like we did to come up with 70 ms delay), but I doubt it will change the results much. The end-to-end behavior is dominated by what the routers do, not the point-to-point network conditions.

To know how much the testnet is like the live net w.r.t. this experiment, we would compare the rttdev values. Thats really the info we need. In snark debug mode it shows RTO and RTT. RTO = RTT + K*RTTDEV. RTTDEV = (RTO - RTT) / K.

A typical 'fast' connection I see in snark is about 1500 RTT and 2500 RTO. So RTTDEV = 250.
Another example, 2500 RTT, 4000 RTO, RTTDEV = 375.

250-375 probably a reasonable range for a bidirectional connection. Might be good to get a typ range for HTTP GET (on the server side, client side is a don't care)

comment:6 Changed 8 weeks ago by zzz

From the above, let's assume RTTDEV = RTT/6 for "fast" connections.

Then with K=4, RTO = 1.67 * RTT.
With K=8, RTO = 2.33 * RTT.

So increasing K to 8 would increase RTO by 2.33/1.67 = 40%.

comment:7 Changed 8 weeks ago by zzz

re: comment 5, after 10 minutes of research, given our 70 ms injected delays, maybe 5 ms of jitter and 0.1% loss probability might be a good place to start. Maybe 0.25% loss if we assume some of the routers are on wifi or cellular.

comment:8 Changed 7 weeks ago by zzz

Version: 0.9.460.9.45

OP told me the other day that this ticket was now not a priority, superseded by subsequent tickets addressing the root causes?

comment:9 Changed 7 weeks ago by Zlatin Balevsky

OP told me the other day that this ticket was now not a priority, superseded by subsequent tickets addressing the root causes?

Yes definitely lower priority than the other streaming tickets, more specifically #2715 . Atm my thinking is the value of K is only relevant in situations with very low rttdev such as a synthetic lossless testnet.

Note: See TracTickets for help on using tickets.