-
Type:
New Feature
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Performance
-
None
-
0
-
None
-
None
-
None
-
None
-
None
-
None
Fast Counters
Scope
To provide very fast counters and timers - making possible an increase in the volume and granularity of stats accumulated within the server.
Not in scope
DTrace integration is not part of this project. Non-x86 processors will not be supported.
Design
Each counter consists of a vector of counters, one per core (or virtual core, in the case of hyper-threading). Counter increments occurs per-core independently and without locking. The counter value is aggregated across the per-core counters. Experiments confirm that core migration errors occur about 1 in 10M counts, an acceptable error rate.
Implementation
The key to the fast implementation is the x86 rdtscp instruction. It loads the 64-bit timestamp counter into EAX,EDX, and the core / hyper-core (i.e.) “node” label into ECX. We need to implement a module that determines the node count in order to correctly allocate fast counter / timer arrays.
Portability issues
Within the class of x86 processors the instruction set varies. For older intel processors, OS support for finding the core label is needed (mainly access to model-specific registers, as opposed to the rdtscp instruction). Each of Linux, Windows, Free BSD, Solaris, and OS X have different system calls for accessing msr’s. The initial implementation should use OS call for obtaining the processor label. It greatly simplifies the code.
Testing Results
Testing shows that per-core counters provide large speed improvements v. single atomic integer with fetch-and-add (15 v. 230 nanos per incr). Using non-locking v. locking increment instructions (with per-node counters) provides about 2X speedup (15 v. 30 nanos per incr). Using Linux sched_getcpu() v. inlined rdtscp instructions provides small additional speedup. It seems very likely that Linux is in fact uses the rdtscp instruction via vsyscall (see: http://lxr.linux.no/#linux+v3.8.7/arch/x86/include/asm/vsyscall.h in the Linux cross-reference). Other OS’s may not be as smart.
System Calls
Windows
(1) find current core id
DWORD cpui;
cpuid = GetCurrentProcessorNumber();
Requirements
Minimum supported client Windows Vista
Minimum supported server Windows Server 2003
Header WinBase.h on Windows Server 2003,
Windows Vista, Windows 7, Windows Server 2008,
Windows Server 2008 R2 (with Windows.h);
Processthreadsapi.h on Windows 8, Windows Server 2012
Library Kernel32.lib
DLL Kernel32.dll
(2) Find core count
vSYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
numCPU = sysinfo.dwNumberOfProcessors;
Requirements
Minimum supported client Windows 2000 Professional
Minimum supported server Windows 2000 Server
Header WinBase.h (include Windows.h)
Linux
(1) find current core id
#include <sched.h>
int cpu = sched_getcpu();
Requirements
glibc 2.6
(2) Find core count
#include <unistd.h>
numCPU = sysconf( _SC_NPROCESSORS_ONLN );
Requirements
This is a POSIX standard function, although _SC_NPROCESSOES_ONLN
is non-standard.
FreeBSD, Mac OS X
(1) find current core id:
tbd
(2) Find core count:
int mib[2];
size_t len = 4;
uint32_t numCPU;
mib[0] = CTL_HW;
mib[1] = HW_AVAILCPU;
sysctl(mib, 2, &numCPU, &len, NULL, 0);
if (numCPU < 1)
{ mib[1] = HW_NCPU; sysctl( mib, 2, &numCPU, &len, NULL, 0 ); if (numCPU < 1) numCPU = 1; }Requirements
OS X versions >= 10.2.
Solaris
(1) find current core id:
#include <sys/processor.h>
processorid_t getcpuid(void);
(2) Find core count:
#include <unistd.h>
numCPU = sysconf( _SC_NPROCESORS_ONLN );
- is duplicated by
-
SERVER-9297 Btree index counters are not thread-safe
-
- Closed
-