Opened 8 months ago

Last modified 7 months ago

#2384 new enhancement

Make now() self-tuning

Reported by: jogger Owned by:
Priority: minor Milestone: undecided
Component: router/general Version: 0.9.37
Keywords: Cc:
Parent Tickets: Sensitive: no

Description

I am a bit reluctant about this one because it takes the pressure to carefully use now(). But results are nothing short of impressive. 2MBps traffic should be possible on a Raspberry together with my other recent tickets.

Basic idea is that now() is a near constant function, so results are easy to predict. The source explains it better. Sorry for the Knuth-style coding I learned at university dozens of years ago.

I am including the code snippets below in case attachments do not work again.

From Clock.java:
public class Clock implements Timestamper.UpdateListener? {

protected volatile int _iter;
protected volatile int _frequency;
protected volatile long _savedTime;


public Clock(I2PAppContext context) {


_iter = 0;
_frequency = 0;
_savedTime = 0;

}


public long now() {

if (++_iter < _frequency)

return _savedTime;

_iter = 0;
long newTime = _offset + System.currentTimeMillis();
long delta = newTime - _savedTime;
_savedTime = newTime;
if (delta == 0)

_frequency++;

else if (delta == 2)

_frequency—;

else if (delta != 1)

_frequency = 0;

return newTime;

}

From Routerclock.java:

@Override
public long now() {

if (++_iter < _frequency)

return _savedTime;

_iter = 0;

long systemNow = System.currentTimeMillis();


_lastSlewed = systemNow;

}

long newTime = offset + systemNow;
long delta = newTime - _savedTime;
_savedTime = newTime;

if (delta == 0)

_frequency++;

else if (delta == 2)

_frequency—;

else if (delta != 1)

_frequency = 0;

return newTime;

}

Subtickets

Change History (27)

comment:1 Changed 8 months ago by Zlatin Balevsky

This is rather dangerous. Also, looking at bulk traffic can make you omit the fact that we need millisecond precision in SSU when measuring latency; a router with such mod may be pushing more total traffic but each individual link may be slower, especially if it's SSU.

Having said that I don't mind adding a SloppyClock class that can be optionally used depending on the situation.

comment:2 Changed 8 months ago by jogger

I definitely do not buy your argument. This mod has an error of 2 ms max and 0.5 typical. And this is not strictly an error, but a relative shift for everything. Anywhere between getting system time and using it elsewhere garbage collection can strike: then you are off many milliseconds (200 by default, 20 for my setup). And it strikes more than once a second. This did not blow up anything in the past.

Thinking of GC: When i2p comes out of GC, we have a clock jump at that time. Also it runs full throttle then to clear out the network buffer, which in my proposal would call for a higher _frequency. Resetting it was a dumb idea. So I made the final checks simpler:

if (delta == 0)

_frequency++;

else if (delta != 1)

_frequency—;

comment:3 Changed 8 months ago by jogger

Final word: If you do not like it this aggressive, one could cut _frequency in half instead of lowering it.

comment:4 Changed 8 months ago by jogger

OK, let´s reset this discussion. We both did not RTFM:

public static long currentTimeMillis()

Returns the current time in milliseconds. Note that while the unit of time of the return value is a millisecond, the granularity of the value depends on the underlying operating system and may be larger. For example, many operating systems measure time in units of tens of milliseconds.

See the description of the class Date for a discussion of slight discrepancies that may arise between "computer time" and coordinated universal time (UTC).

Interesting read:

http://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html

While the above mod works perfect for me I will do the following:

  • Test behaviour of currentTimeMillis on three totally different architectures
  • Based on outcome write code to analyse performance of clock.now() in production
  • use this to develop new version based on idea above
  • statistically prove that new algorithm is as good as current or possibly better

Possible approach: wrap currentTimeMillis() with speedup logic that runs at least as smooth but maybe time-shifted by some ms. The _offset logic would automatically take care of a small time shift as it will if system time is off by some ms.

comment:5 Changed 8 months ago by Zlatin Balevsky

I'm curious to see how you're going to benchmark how long a call to currentTimeMillis() takes. I suggest you bracket it in System.nanoTime() calls and take at least 20000 samples, discarding the first 10000. That is because by default JIT kicks in after 10000 invocations.

comment:6 Changed 8 months ago by jogger

Hi zab, going in a totally different direction. Test results for 10 sec. samples using Java 11 are in:

Intel Core Duo W10 (idle): Clock continously jumps in 15-16 ms intervals only.
2012 Mac Mini Quad i7 server (low load): 0-1 clock jumps 2 ms
2012 MacBook? Pro Dual i7 (low load): 1-2 clock jumps 2-3 ms
Linux Odroid HC 1 (low load) Kernel 4.14 all cores : 0-1 clock jumps 2 ms
same single fast core: 1 clock jump 3-4 ms
same single slow core: 1 clock jump 6-7 ms
same 50% load all cores: >100 clock jumps 4-5 ms

So I can base future analysis on the assumption that there is a jumpy clock, no precision timing. I will port my test code to clock().now() to see real time behaviour with 3000 tunnels and GC working. Will take some time.

comment:7 Changed 8 months ago by jogger

Keeping you updated. Before analysing I cleaned up clock structure. The router clock slew logic runs only 40 times a second, otherwise super.now() should be returned ASAP. By doing so we separate a faster router clock together with the slew logic from analysing and improving the system clock in the super class. By implementing the following all would benefit soon, no matter whether the clock function itself could be improved. Would love it in 0.9.39.

@Override
public long now() {

long systemNow = super.now();
long sinceLastSlewed = systemNow - _lastSlewed;
if (sinceLastSlewed < MAX_SLEW && sinceLastSlewed ≥ 0)

return systemNow; return ASAP if nothing to do

if (sinceLastSlewed ≥ MASSIVE_SHIFT_FORWARD

sinceLastSlewed ⇐ 0 - MASSIVE_SHIFT_BACKWARD) {
_lastSlewed = systemNow;
notifyMassive(sinceLastSlewed);
return systemNow;

}
if (sinceLastSlewed ≥ MAX_SLEW) {

long desiredOffset = _desiredOffset;
long offset = _offset;
long delta = 0;
if (desiredOffset > offset) {

delta = 1; The Math.min is useless, if it is > 1 we need an updated desiredOffset first

} else if (desiredOffset < offset) {

delta = -1;

}
_offset = offset + delta;
_lastSlewed = systemNow;
return delta + systemNow;

}
return systemNow; doing nothing if system clock went backward
and we will do nothing for many seconds until clock ≥ lastslewed + MAX_SLEW

}

comment:8 Changed 8 months ago by jogger

While I am still running tests for clock performance it was immediately seen that a few simple extra statements within the clock functions have a clear 10% negative impact on router throughput (simply comparing iptraf-ng vs. top -H before and after). Reason unknown, but this explains why the above streamlined mod to the router clock is such a big success.

So I provide a diff and ask for inclusion in 0.9.39.

a@aHC1a:~$ diff 38t/router/java/src/net/i2p/router/RouterClock.java i2p-0.9.38/router/java/src/net/i2p/router/RouterClock.java
119c119
< 
---
>             
130c130
< 
---
>             
147c147
< 
---
>             
190c190
< 
---
>             
244c244
<      *
---
>      * 
254,260c254,257
<         long adjustedNow = super.now();
<         long sinceLastSlewed = adjustedNow - _lastSlewed;
<         // using adjustedNow means a small change
<         // effectively slewing forward every MAX_SLEW - 1
<         // slewing backward every MAX_SLEW + 1
<         if (sinceLastSlewed < MAX_SLEW && sinceLastSlewed >= 0)
<             return adjustedNow; // return ASAP if nothing to do
---
>         long systemNow = System.currentTimeMillis();
>         // copy the global, so two threads don't both increment or decrement _offset
>         long offset = _offset;
>         long sinceLastSlewed = systemNow - _lastSlewed;
263c260
<             _lastSlewed = adjustedNow;
---
>             _lastSlewed = systemNow;
265,267c262,263
<             return adjustedNow;
<         }
<         if (sinceLastSlewed >= MAX_SLEW) {
---
>         } else if (sinceLastSlewed >= MAX_SLEW) {
>             // copy the global
269,270d264
<             long offset = _offset;
<             long delta = 0;
272c266,268
<                 delta = 1; // The Math.min is useless, if it is > 1 we need an updated desiredOffset first
---
>                 // slew forward
>                 offset += Math.min(10, sinceLastSlewed / MAX_SLEW);
>                 _offset = offset;
274c270,275
<                 delta = -1;
---
>                 // slew backward, but don't let the clock go backward
>                 // this should be the first call since systemNow
>                 // was greater than lastSled + MAX_SLEW, i.e. different
>                 // from the last systemNow, thus we won't let the clock go backward,
>                 // no need to track when we were last called.
>                 _offset = --offset;
276,278c277
<             _offset = offset + delta;
<             _lastSlewed = adjustedNow;
<             return delta + adjustedNow;
---
>             _lastSlewed = systemNow;
280,281c279
<         return adjustedNow; // doing nothing if system clock went backward
<         // and we will do nothing for many seconds until clock >= lastslewed + MAX_SLEW
---
>         return offset + systemNow;

comment:9 Changed 8 months ago by Zlatin Balevsky

I'm still very wary of this change. Can you make public the code you used to benchmark the performance of System.currentTimeMillis in comment 7?

comment:10 Changed 8 months ago by jogger

Hi zab,

here is my code. From the number of loops one can easily calculate the time for 1 call. 1.100-1.200 ns for my ARM32 depending on the cores used, not too bad because >600 ns are reported for i7 Linux x64. In my previous comment I do not vow to change the clock function itself now, but in a first step to speed up RouterClock?.java, which is overly clumsy. Test the patch above (should be applied reverse). I saw a real boost.

public class TimeTest {

    public static void main(String[] args) {
		
		int jump1 = 0;
		int jumpmore = 0;
		int loops = 0;
		long newiter;
		long iter = System.currentTimeMillis();
		long testend = iter + 10000;
		while (iter < testend) {
			loops++;
			newiter = System.currentTimeMillis();
			if (iter == newiter)
				continue;
			if (++iter == newiter)
				++jump1;
			else {
				++jumpmore;
				iter = newiter;
			}
		}
		System.out.println("Loops "+loops+"  Jump1ms "+jump1+"  More ms  "+jumpmore);
    }
}

comment:11 Changed 8 months ago by Zlatin Balevsky

Ok, I increased the test time to 100s and got the following:

MacBook? Pro 2017, OSX, 3.1GHz Intel Core i7:
Loops 3632690841 Jump1ms 99996 More ms 2

Ubuntu Bionic, Intel® Xeon® CPU E3-1240 v6 @ 3.70GHz:
Loops 5103574842 Jump1ms 100000 More ms 0

So this is an argument against using a self-tuning version of the clock at least on X86_64. I have yet to analyze thoroughly your patch from comment 9, but if it is making a noticeable difference on your ARM boards we could possibly use it only on ARM. (We already special-case a lot of things if we detect that we're running on ARM anyway).

comment:12 Changed 8 months ago by Zlatin Balevsky

Also the patch in comment 9 is only against RouterClock?.java; I think you meant to patch the superclass as well? The easiest way to create a proper patch is to

  1. check out with git
  2. edit as many files as you like
  3. git diff

comment:13 Changed 8 months ago by Zlatin Balevsky

I take back comment 12, now I understand what you're doing in comment 9 :)

So the reason you're seeing a speedup is because you've reduced the number of branches and have put the most common case first. Good work! Now I feel more confident about including the patch from 9 in production.

comment:14 Changed 8 months ago by jogger

OK, so your Linux box runs about 20 ns a call. That´s much faster than the >600ns reported for Linux X86_64 all over the Web??? I am running kernel 4.14 and Java 11, difference anywhere?

I have completed tests, running the following test code for now() along with all my pending tickets implemented.

    public long now() {
        long millis = System.currentTimeMillis();
        long savedtime = _savedtime;
        _savedtime = millis;
	if (savedtime == millis) {
            _iter++;
            return _offset + millis;
	}
	if (savedtime + 1 == millis)
		++_jump1;
	else
	    ++_jump2;
   	if (++_iter >= 100000000) {
   	    _iter = 0;
	    long jump1 = _jump1;
	    _jump1 = 0;
	    long jump2 = _jump2;
	    _jump2 = 0;
	    long starttime = _starttime;
	    _starttime = millis;
    	    System.out.println("time "+(millis-starttime)+"  Jump1 "+jump1+"  More ms  "+jump2);
   	}
        return _offset + millis;
    }

average output with about 2.500 tunnels and 2 MBps traffic, 100.000.000 invocations:

2019/01/24 11:20:42 | time 528006 Jump1 517360 More ms 8627
2019/01/24 11:29:17 | time 515431 Jump1 506163 More ms 7590
2019/01/24 11:37:57 | time 520182 Jump1 508658 More ms 8157
2019/01/24 11:46:47 | time 529292 Jump1 516691 More ms 8303
2019/01/24 11:56:04 | time 557233 Jump1 539667 More ms 10635

time is accurate of course, the other figures were hit by race conditions. The more field must be multiplied by two at least and added to jump1 to equal time in theory. From jconsole I estimated the GC to be 600*15 ms, also not showing up.

Summarized results:

  • now() runs near 200 times per ms
  • When the clock moves forward the step is 1 by >98%
  • from the frequency of invocations it is most likely that the other clock steps seen are 2 ms (GC steps must be around 15ms).

estimated CPU per second: 200/ms * 1.250ns * 1000 = 250ms/s = 25% ~3% total or near 10% of total router CPU. This is twice as much as total GC or as much as NTCP Writer.

Given the massive number of invocations I now suggest a simplified and more conservative tuning function that provides an average negative clock shift of .25ms which means that hardly any effect will be seen.

If someone declines this mod she should also ditch the entire Windows platform, where System.currentTimeMillis() is not only extremely jumpy, making functions like the router clock behave different from other platforms, but also provides an average negative clock shift of 8 ms that nobody ever complained about.
Before making a special case for ARM someone with proper equipment should check a number of X86_64 Linux boxes to see which execution time for System.currentTimeMillis() is typical.

    public long now() {
        if (++_iter < _frequency)
            return _savedTime;
        _iter = 0;
        long newTime = _offset + System.currentTimeMillis();
        long delta = newTime - _savedTime;
        _savedTime = newTime;
        if (delta == 0)
            _frequency++;
        else
            _frequency--;
        return newTime;
    }

comment:15 Changed 8 months ago by Zlatin Balevsky

That´s much faster than the >600ns reported for Linux X86_64 all over the Web???

¯\_(ツ)_/¯ But I'm guessing it has to do with the size of the L1 cache; the 600ns figure might be from a "cold" first invocation of the method. The macbook and the xeon are the only physical boxes I have, and there's no point benchmarking on virtual; I've asked zzz to try your test on an rpi and physical windows.

As far as your patches in general, I'm in full support of all of them going in 39 except for the self-tuning one which I need to think through really carefully.

comment:16 Changed 8 months ago by jogger

OK, here is the patch for the self-tuning clock.now(). It is half as aggressive as the initial proposal as it calls System.currentTimeMillis twice per ms on average. Results in an average negative clock shift of .25 ms (to be precise: ¼ of the granularity of System.currentTimeMillis). Apart from any clock shifts being accounted for in the router clock this is far below any any precision one could see in UDP timing, let alone the 16 ms clock jumps in windoze.

0.9.38 clock.java

30a31,34
>     protected volatile int _iter;
>     protected volatile int _frequency;
>     protected volatile long _savedTime;
> 
185c189,199
<         return _offset + System.currentTimeMillis();
---
>         if (++_iter < _frequency)
>             return _savedTime;
>         _iter = 0;
>         long newTime = _offset + System.currentTimeMillis();
>         long delta = newTime - _savedTime;
>         _savedTime = newTime;
>         if (delta == 0)
>             _frequency++;
>         else
>             _frequency--;
>         return newTime;

comment:17 Changed 7 months ago by Zlatin Balevsky

ditch the entire Windows platform

Last time I worked with this was on Windows XP, so things may have changed since then, but at the time Windows defaulted to a "low-precision" clock and a hack was necessary to make it enable the high-precision clock for the given process. The hack was to start a background thread that calls Thread.sleep(Integer.MAX_VALUE);

comment:18 Changed 7 months ago by jogger

That low-res clock still exists in Windows 10 as I showed above. Your hack was reported for Win XP only. I also saw reports that fiddling with clock resolution can make the clock run faster, which i2p can definitely not accept. So I doubt there will be an easy solution for Windoze. Also depending on the processor and exact type of windows version time slices handed out by the scheduler may be rather coarse, I read 5/10/15 ms, so that´s another obstacle there.

comment:19 Changed 7 months ago by zzz

@OP in the future, please provide well-formed unified diffs using the -u option, as requested in #2382. This will make things far easier for us. thanks.

comment:20 Changed 7 months ago by zzz

@OP Also, please put the changed file 2nd in the arguments to diff, so the patch doesn't look reversed.

comment:21 Changed 7 months ago by zzz

reversed and unified diff from comment 8:

#
# old_revision [0d44c0843ca63d3d78061f6cbf923fac72b2abc8]
#
# patch "router/java/src/net/i2p/router/RouterClock.java"
#  from [8d7e6d3a328db58a92c92f054a5ceee135b28ad8]
#    to [3b7b7774db6acf813c177757feb8fbdd274834aa]
#
============================================================
--- router/java/src/net/i2p/router/RouterClock.java	8d7e6d3a328db58a92c92f054a5ceee135b28ad8
+++ router/java/src/net/i2p/router/RouterClock.java	3b7b7774db6acf813c177757feb8fbdd274834aa
@@ -251,32 +251,34 @@ public class RouterClock extends Clock {
      */
     @Override
     public long now() {
-        long systemNow = System.currentTimeMillis();
-        // copy the global, so two threads don't both increment or decrement _offset
-        long offset = _offset;
-        long sinceLastSlewed = systemNow - _lastSlewed;
+        long adjustedNow = super.now();
+        long sinceLastSlewed = adjustedNow - _lastSlewed;
+        // using adjustedNow means a small change
+        // effectively slewing forward every MAX_SLEW - 1
+        // slewing backward every MAX_SLEW + 1
+        if (sinceLastSlewed < MAX_SLEW && sinceLastSlewed >= 0)
+            return adjustedNow; // return ASAP if nothing to do
         if (sinceLastSlewed >= MASSIVE_SHIFT_FORWARD ||
             sinceLastSlewed <= 0 - MASSIVE_SHIFT_BACKWARD) {
-            _lastSlewed = systemNow;
+            _lastSlewed = adjustedNow;
             notifyMassive(sinceLastSlewed);
-        } else if (sinceLastSlewed >= MAX_SLEW) {
-            // copy the global
+            return adjustedNow;
+        }
+        if (sinceLastSlewed >= MAX_SLEW) {
             long desiredOffset = _desiredOffset;
+            long offset = _offset;
+            long delta = 0;
             if (desiredOffset > offset) {
-                // slew forward
-                offset += Math.min(10, sinceLastSlewed / MAX_SLEW);
-                _offset = offset;
+                delta = 1; // The Math.min is useless, if it is > 1 we need an updated desiredOffset first
             } else if (desiredOffset < offset) {
-                // slew backward, but don't let the clock go backward
-                // this should be the first call since systemNow
-                // was greater than lastSled + MAX_SLEW, i.e. different
-                // from the last systemNow, thus we won't let the clock go backward,
-                // no need to track when we were last called.
-                _offset = --offset;
+                delta = -1;
             }
-            _lastSlewed = systemNow;
+            _offset = offset + delta;
+            _lastSlewed = adjustedNow;
+            return delta + adjustedNow;
         }
-        return offset + systemNow;
+        return adjustedNow; // doing nothing if system clock went backward
+        // and we will do nothing for many seconds until clock >= lastslewed + MAX_SLEW
     }
 
     /*

comment:22 Changed 7 months ago by zzz

Priority: majorminor

@OP I'm struggling to understand the goals of this ticket.

  • What do you mean by "self-tuning"?
  • Why is the current code not "self-tuning" already?
  • How do your proposed changes make it more "self-tuning"?

comment:23 Changed 7 months ago by jogger

@zzz Goals are easy: Save 10% total CPU, useful on slower machines.

The change for the router clock consists of code modifications for speedup, code cleanup and proper use of super, so any further work can be done in the super class.

The change for clock.now() then aims at avoiding 99% of repeating expensive calls to System.currentTimeMillis(). The proposed change determines a frequency of calls so that the function runs much faster but limits itself to a negative clock shift of .25ms average only. Thats more than good enough for timing purposes, given the fact that we have a negative shift anyway of 0.5 ms on Linux and 8 ms on Windoze.

comment:24 Changed 7 months ago by jogger

Since I knew zab would not let this go through without the use of Atomic, which is superior anyway, here is the final tested diff:

--- "Clock orig.java"	2019-02-02 10:11:02.507235251 +0100
+++ "Clock patch.java"	2019-02-08 18:53:58.572804303 +0100
@@ -3,6 +3,8 @@
 import java.util.Date;
 import java.util.Set;
 import java.util.concurrent.CopyOnWriteArraySet;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
 
 import net.i2p.I2PAppContext;
 import net.i2p.time.BuildTime;
@@ -27,6 +29,10 @@
     protected volatile long _offset;
     protected boolean _alreadyChanged;
     private final Set<ClockUpdateListener> _listeners;
+
+    private  AtomicInteger _iter = new AtomicInteger(0);
+    private  AtomicInteger _frequency = new AtomicInteger(0);
+    private  AtomicLong _savedTime = new AtomicLong(0);
     
     public Clock(I2PAppContext context) {
         _context = context;
@@ -182,7 +188,18 @@
      *
      */
     public long now() {
-        return _offset + System.currentTimeMillis();
+        // aims to check currentTimeMillis twice per ms under constant load
+        // negative clock shift avg 0.25 ms under constant load
+        // saves 99% system calls at 200 calls / sec
+        if (_iter.incrementAndGet() <= _frequency.get())
+            return _savedTime.get();
+        _iter.set(0);
+        long newTime = _offset + System.currentTimeMillis();
+        if (newTime == _savedTime.getAndSet(newTime))
+            _frequency.incrementAndGet();
+        else
+            _frequency.decrementAndGet();
+        return newTime;
     }
 
     public void addUpdateListener(ClockUpdateListener lsnr) {

comment:25 Changed 7 months ago by zzz

OK, zab originally asked me to review (only) the RouterClock? diff in comment 8, which I reformatted in comment 21. Now in comment 24 you've posted a completely separate diff for Clock, but I see that's a reworking of the diff in comment 16.

Do I understand correctly from comment 23 that you are proposing that we review and test both the comment 21 and comment 24 diffs?

The comment 24 diff is interesting. It depends on now() being called at a consistent rate. If it's widely variable, then the clock could get "stuck" for a while. It would need extensive testing on low-traffic routers, and out of RouterContext?, that's where this could break things. There's a reason why Clock is simple and all the slewing is in RouterClock?.

I'm not sure if the logic in comment 24 is foolproof. There's no min or max for _frequency, it could go negative, although probably won't in practice.

Three atomics seems a little much, but was trying to think of a way to get it down to two, and I can't.

I don't like adding complexity and not always calling currentTimeMillis() in Clock. Outside of RouterContext?, this could be called once every few seconds, or in bursts, and could really break things. Perhaps you had not considered this use case.

Thanks for the explanation in comment 23. Between that, and looking at the diff for Clock in comment 24, I understand much better what you're trying to do.

However, I'm still pretty skeptical that we can make this work across all workloads. I think all the Clock changes have to be moved to RouterClock? for the reasons stated above.

Now, to review the comment 8/21 diff as zab requested:

There's a lot of changes rolled in here. Doing nothing when the clock goes backwards is dangerous, it's not just seconds, could be an hour, and that's actually normal when the user fixes the timezone, I think on windows?

Not sure I understand the "math.min is useless" comment. I think what you've done in both the forward and backward slewing cases is reduce the slew amount. In the forward case from 10 to 1? We slew quickly for a reason.

Not that I've done it recently, but have you reviewed the mtn or git history, and past tickets, for RouterClock? to see why things are the way they are? We have a long history of issues in the clock and the proposed changes seem to be solving a particular issue without a full awareness of this history, the use cases, and the way things could break.

I don't think that applying the comment 8/21 diff alone at this time makes sense and this probably isn't likely for 0.9.39. But let's keep the discussion going and see if there's a way we can reduce the system calls while minimizing risk of breaking something.

comment:26 Changed 7 months ago by jogger

Just to set things apart:

The diff in comment 8 does nothing else than code cleanup for speed and using super(). I did not intend to change the logic and I think I did not. But when I find time I will write detailed comments.

the comment 24 diff relies on the typical call rate of 200/ms. for the call rate going down 50% the traffic must really completely break down, in which case we problems different from clock precision. But even a sudden 50% reduction would mean an average clock error of .5 ms. That´s what any clock at 1ms resolution can get at. If _frequency goes below zero, the system call is executed every time, until the system clock stabilizes, it´s in the source.

If you do not believe me, measure the function and pull the plug. I only post diffs after have measured several alternatives, in this case choosing the most conservative one. So any measurement that supports your view would be welcome.

comment:27 Changed 7 months ago by zzz

I believe you. I believe your test results. That doesn't mean it will work for everybody, especially in app context, where the call rate may be close to zero. We will be doing measurements and testing, but it may take a while. I'll take another look at the comment 8 diff, perhaps I didn't understand it right.

Note: See TracTickets for help on using tickets.