Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9393

Very, very fast counters

      Fast Counters

      Scope

      To provide very fast counters and timers - making possible an increase in the volume and granularity of stats accumulated within the server.

      Not in scope

      DTrace integration is not part of this project. Non-x86 processors will not be supported.

      Design

      Each counter consists of a vector of counters, one per core (or virtual core, in the case of hyper-threading). Counter increments occurs per-core independently and without locking. The counter value is aggregated across the per-core counters. Experiments confirm that core migration errors occur about 1 in 10M counts, an acceptable error rate.

      Implementation

      The key to the fast implementation is the x86 rdtscp instruction. It loads the 64-bit timestamp counter into EAX,EDX, and the core / hyper-core (i.e.) “node” label into ECX. We need to implement a module that determines the node count in order to correctly allocate fast counter / timer arrays.

      Portability issues

      Within the class of x86 processors the instruction set varies. For older intel processors, OS support for finding the core label is needed (mainly access to model-specific registers, as opposed to the rdtscp instruction). Each of Linux, Windows, Free BSD, Solaris, and OS X have different system calls for accessing msr’s. The initial implementation should use OS call for obtaining the processor label. It greatly simplifies the code.

      Testing Results

      Testing shows that per-core counters provide large speed improvements v. single atomic integer with fetch-and-add (15 v. 230 nanos per incr). Using non-locking v. locking increment instructions (with per-node counters) provides about 2X speedup (15 v. 30 nanos per incr). Using Linux sched_getcpu() v. inlined rdtscp instructions provides small additional speedup. It seems very likely that Linux is in fact uses the rdtscp instruction via vsyscall (see: http://lxr.linux.no/#linux+v3.8.7/arch/x86/include/asm/vsyscall.h in the Linux cross-reference). Other OS’s may not be as smart.

      System Calls

      Windows

      (1) find current core id

      DWORD cpui;

      cpuid = GetCurrentProcessorNumber();

      Requirements

      Minimum supported client Windows Vista

      Minimum supported server Windows Server 2003

      Header WinBase.h on Windows Server 2003,

      Windows Vista, Windows 7, Windows Server 2008,

      Windows Server 2008 R2 (with Windows.h);

      Processthreadsapi.h on Windows 8, Windows Server 2012

      Library Kernel32.lib

      DLL Kernel32.dll

      (2) Find core count

      vSYSTEM_INFO sysinfo;
      GetSystemInfo( &sysinfo );

      numCPU = sysinfo.dwNumberOfProcessors;

      Requirements

      Minimum supported client Windows 2000 Professional

      Minimum supported server Windows 2000 Server

      Header WinBase.h (include Windows.h)

      Linux

      (1) find current core id

      #include <sched.h>
      int cpu = sched_getcpu();

      Requirements

      glibc 2.6

      (2) Find core count

      #include <unistd.h>

      numCPU = sysconf( _SC_NPROCESSORS_ONLN );

      Requirements

      This is a POSIX standard function, although _SC_NPROCESSOES_ONLN

      is non-standard.

      FreeBSD, Mac OS X

      (1) find current core id:

      tbd

      (2) Find core count:

      int mib[2];

      size_t len = 4;

      uint32_t numCPU;

      mib[0] = CTL_HW;
      mib[1] = HW_AVAILCPU;

      sysctl(mib, 2, &numCPU, &len, NULL, 0);

      if (numCPU < 1)

      { mib[1] = HW_NCPU; sysctl( mib, 2, &numCPU, &len, NULL, 0 ); if (numCPU < 1) numCPU = 1; }

      Requirements

      OS X versions >= 10.2.

      Solaris

      (1) find current core id:

      #include <sys/processor.h>
      processorid_t getcpuid(void);

      (2) Find core count:

      #include <unistd.h>

      numCPU = sysconf( _SC_NPROCESORS_ONLN );

            Assignee:
            backlog-server-platform DO NOT USE - Backlog - Platform Team
            Reporter:
            paul.pedersen Paul Pedersen
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: