Opened 5 months ago

Last modified 5 months ago

#2409 new enhancement

Minimize statistics overhead esp. UDP

Reported by: jogger Owned by:
Priority: minor Milestone: undecided
Component: router/general Version: 0.9.38
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

While measuring clock performance for #2384, I noticed that insertion of minimal diagnostic code into now() clearly slowed down UDP packet pusher. It became clear that this is caused by tons of (indirect) now() calls going into those addRates.

I threw out those RateStats? and addRates (except required) everywhere in transport.udp and all the code surrounding it and then passed 2 iterations through dead code analysis to find more calls to now() now useless. Huge editing task and still functions left that intermix diagnostics and operation.

Just demonstrating the big difference for router runtime to issue 100.000.000 calls to now(). See #2384 for details. Before:

2019/01/24 11:20:42 | time 528006 Jump1 517360 More ms 8627
2019/01/24 11:29:17 | time 515431 Jump1 506163 More ms 7590
2019/01/24 11:37:57 | time 520182 Jump1 508658 More ms 8157
2019/01/24 11:46:47 | time 529292 Jump1 516691 More ms 8303
2019/01/24 11:56:04 | time 557233 Jump1 539667 More ms 10635

After:

2019/01/26 13:35:50 | time 708983 Jump1 669352 More ms 18408
2019/01/26 13:47:47 | time 717449 Jump1 680139 More ms 17772
2019/01/26 14:01:01 | time 794167 Jump1 734051 More ms 26030
2019/01/26 14:13:12 | time 731197 Jump1 681337 More ms 22033
2019/01/26 14:25:02 | time 709830 Jump1 670534 More ms 18537

Down from 190 to 140 calls/ms by skipping some stats.

I guess these RateStats? are somewhere accessible from the commandline, but they are totally useless for the average users. Options are clear:

  • omit
  • conditional compile
  • surround by some log check like minimum level info
  • maybe more.

Even most of the required ones are only there for the stats web page and the graphs and rarely used.

Subtickets

Change History (6)

comment:1 Changed 5 months ago by Zlatin Balevsky

Some RateStat objects are used when calculating things like RTT and RTO, so those should not be touched, but you're right, there are many that can be omitted.

Conditional compile is a bit difficult in Java since we do not have pre-processor macros like #ifdef. What can be done with medium-sized effort is to refactor the stats into annotations; this is my preferred solution to this.

comment:2 Changed 5 months ago by zzz

All required ratestats are created with createRequiredRateStat(). Those are the ones needed for everybody. The ones created with createRateStat() are noops unless the user has configured for "full stats" on /configstats.

I do comment out stats every now and then, but it's been a while. SSU does have a lot of stats in there, added by jrandom when he was doing the development.

A possible conditional for the non-required stats could be:

    if (_context.getBooleanProperty("stat.full")) {
        ...
    }

or that boolean could even be stored final, as changing the config requires a restart anyway.

Re: some annotations, we've never added any custom annotations, I'm not opposed to it, just have to learn more about them (they're very confusing to me) and make sure they work on Android with Java 6.

comment:3 Changed 5 months ago by zzz

@OP will you be providing a patch for this or would you like us to do it? Don't want to duplicate effort.

comment:4 Changed 5 months ago by jogger

I would prefer you to do it your way as a lot of code is affected. I was aiming at the clock.now() often appearing twice in function arguments as well as at the surround diagnostic code, often multiple lines long and calling additional functions. Those are definitely not noops.

comment:5 Changed 5 months ago by zzz

OK. In the OP you identified PacketPusher? as the place with the problem. I'll take a look at that one. Are there any other files you saw that are particularly bad? Obviously we're not going to do this everywhere.

comment:6 Changed 5 months ago by jogger

The fastest and easiest way to solve more than 90% of the problem would be to find a way to avoid the clock.now() within getLifeTime(). That´s the exclusive spot where the 50.000+ calls/sec (stats related only) to System.currentTimeMillis() go through. Or implement #2384 (including the super class mod) then this ticket is done as the 50.000 calls would not cause one extra call to System.currentTimeMillis().

Last edited 5 months ago by jogger (previous) (diff)
Note: See TracTickets for help on using tickets.