[SERVER-4740] Use monotonic clock sources for Timer Created: 22/Jan/12  Updated: 06/Dec/22  Resolved: 18/Dec/19

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ben Becker Assignee: Backlog - Service Architecture
Resolution: Done Votes: 8
Labels: PM-733, performance, profiling, stats
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File YCSB-Load.png    
Issue Links:
Duplicate
is duplicated by SERVER-6444 when time (ticks) go backwards Timer ... Closed
is duplicated by SERVER-6550 db.currentOp() incorrectly reports in... Closed
is duplicated by SERVER-2886 Many commands being executed for 1271... Closed
is duplicated by SERVER-10603 Absurd query times in log Closed
is duplicated by SERVER-4924 Use CLOCK_MONOTONIC for timing rather... Closed
Related
related to SERVER-6452 Replace time(0) calls Backlog
related to SERVER-16763 mongod terminate due to mongo::DBTryL... Closed
related to SERVER-18562 YCSB load phase (insert only) push 1... Closed
related to SERVER-4709 Strange timing of operations reported... Closed
is related to SERVER-15122 Result of gettimeofday() is not checked Closed
Assigned Teams:
Service Arch
Participants:

 Description   

Mongo currently uses gettimeofday() for all posix interval metrics. As this function is not monotonic, we may see anomalies under certain conditions; e.g. VMM live migration, some SMP process migrations, system sleep/suspend/hibernate, NTP or system date changes, TSC-related kernel or VMM bugs, etc.

The FineClock class (src/mongo/db/stats/fine_clock.h) implements a monotonic clock source, however the code is currently Linux specific and unused. The goal here would be to make this class fully functional on all platforms and ensure performance is >= gettimeofday(). It may also be worth migrating the win32 code from boost::xtime to QueryPerformanceCounter() or similar as boost::xtime has been deprecated.



 Comments   
Comment by Andy Schwerin [ 18/Dec/19 ]

The types and implementations referred to in this ticket are now gone.

Comment by Almansour Belleh Blanco [X] [ 02/Sep/16 ]

What is the status of this issue?

Comment by Eitan Klein [ 19/May/15 ]

OK, I will create a new one for the windows issue

Comment by Eitan Klein [ 19/May/15 ]

YCSB load phase push the 16 CORE to 100% CPU due to the high resolution timer in windows, See the profiler output in the attached bitmap

Comment by Githook User [ 27/May/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-4740 Don't use unsigned for time deltas

Leads to weird bugs when time goes backwards.

This commit doesn't resolve SERVER-4740 (Use monotonic clock sources for
Timer) but lessens the impact of using a non-monotonic clock.
Branch: master
https://github.com/mongodb/mongo/commit/00b9a481e421ee720a6b4274012dc14e244aa5e2

Comment by Githook User [ 27/May/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-4740 Don't use unsigned for time deltas

Leads to weird bugs when time goes backwards.

This commit doesn't resolve SERVER-4740 (Use monotonic clock sources for
Timer) but lessens the impact of using a non-monotonic clock.
Branch: master
https://github.com/mongodb/mongo/commit/00b9a481e421ee720a6b4274012dc14e244aa5e2

Comment by Ben Becker [ 24/Jan/12 ]

A few general comments:

The core issue with functions like gettimeofday() and boost::xtime() is that timing information can be obtained from multiple sources (generally HPET, TSC, PIT, RTC or ACPI/PM), and these sources are not standardized.

PM Timer is likely the most accurate source for measuring time intervals. Most hypervisors guarantee monotonically increasing values from the PM timer within a guest's vcpu.

TSC is a monotonically increasing counter tied to a single physical core. While generally fast and reliable on uniprocessor systems, it is not synchronized across cores and is especially problematic with multiprocessor speed-scaling chips. These issues can easily be seen by running a thread that repeatedly prints the results of the RDTSC instruction and watching the results as the thread is migrated across cores (or simply print the results from multiple threads). Details of TSC-based profiling are available in the 'Intel 64 and IA-32 Architectures Software Developer's Manual', Volume 3, Chapter 17.12.

HPET is a newer hardware timer that replaced the PIT around 2005. It has a very slow setup time and thus is generally only used for firing short-interval interrupts (e.g. multimedia timers).

RTC The real time clock is used for wall-clock measurements and is subject to drifting and skewing (e.g. NTP sync, VMM sync, etc).

To elaborate more on some platform specific issues:

VMMs:
Virtual Machine Managers (hypervisors) like Xen, KVM, Hyper-V, VirtualBox or EC2 generally provide multiple time sources, but the implementations vary. For example, Hyper-V implementes 'partition reference time enlightenment' which emulates the TSC, thus providing reliable TSC counts while the underlying hardware may not. Xen also provides a similar mechanism. KVM's kvmclock() is the main source for time data, however there have been many bugs over the past few years, primarily related to TSC values. There have also been several timing-related bugs in Linux guest VM kernels running under EC2 (and Xen).

Win32:
Older versions of windows (<= win2k3/winxp) may require the /USEPMTIMER boot.ini parameter to ensure QueryPerformanceCounter() returns reliable results (See: http://support.microsoft.com/kb/895980). A comprehensive (but older) article on implementing an accurate high-precision timer on win32 is here: http://msdn.microsoft.com/en-us/magazine/cc163996.aspx.

Linux:
Most linux distributions come with librt, the POSIX advanced real time library. This lib implements

{clock_gettime()}

which is likely to be the most accurate method of collecting monotonic timing data. It uses vsyscalls which are faster than normal syscalls on native (and HVM?) systems, however Xen's pvclock does not appear to support this feature as a vsyscall (and may actually incur a VMEXIT). See http://xen.1045712.n5.nabble.com/pvclock-PV-and-HVM-and-vsyscall-td3213970.html and http://lwn.net/Articles/388188/ for more details.

BSD: Most BSD platforms also seem to support librt. Needs testing.

Solaris: Recent releases (>=2.6) appear to support librt, however clock_gettime() was originally in a library called libposix4. Needs further investigation and testing.

OS X/Darwin: Mac OS X does not support librt, however there are some time-related functions in LibSystem that may work. Needs further investigation and testing.

Generated at Thu Feb 08 03:06:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.