[SERVER-18562] YCSB load phase (insert only) push 16 core machine to 100 % due to high resoultion timers Created: 19/May/15  Updated: 03/Jun/19  Resolved: 03/Jun/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 3.1.3
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Eitan Klein Assignee: DO NOT USE - Backlog - Platform Team
Resolution: Done Votes: 1
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File YCSB-Load.png    
Issue Links:
Depends
depends on SERVER-18922 Choose time source for measuring shor... Closed
Related
related to SERVER-18613 We can double the WiredTiger YCSB loa... Closed
is related to SERVER-17630 Insert only workload under stress - h... Closed
is related to SERVER-4740 Use monotonic clock sources for Timer Closed
Operating System: Windows
Sprint: Platform 6 07/17/15, Platform 7 08/10/15
Participants:

 Description   

MongoDB shell version: 3.1.3-pre-

Environment:

• Single mongod with wiredtiger as storage engine
• Windows 2012
• WriedTiger default configuration
• EC2 machine c3.4xlarge

Workload:

• Used YCSB load

Issue -

During insert only workload it's appear that high resolution counter which responsible to notify if operation take longer then X (100msec default) consume 60% of the CPU

The impact is so big that it mask the different between SSD drive to spin disk on windows.



 Comments   
Comment by Mark Benvenuto [ 30/Jun/15 ]

I ran the following tests to compare the impact of various time APIs on different platforms. I was not interested in comparing which platform is faster overall. I wanted to understand the relative performance of the various timing apis on each platform.

Using https://github.com/DigitalInBlue/Celero, a micro benchmark framework, I evaluated the following time sources on 3 platforms. I ran 10 samples of 1000000 calls each in each case.

Time Source Accuracy
curTimeMicros64 100ns
QueryPerformanceCounter 100ns
GetCurrentTime 1ms-12ms
GetTickCount64 1ms-12ms
QueryUnbiasedInterruptTime 1ms-12ms

Note: curTimeMicros64 comes from MongoDB's time_support.cpp

Test Platforms

  • Thinkmate, Windows 8.1, 6 Core, Intel Core i7 4930K
  • EC2 c3.2xlarge, Windows 2008 R2, 8 vcpu, 15 GB RAM
  • Azure A6, WIndows 2008 R2, 4 vcpu, 28 GB RAM
  • Azure A6, WIndows 2012 R2, 4 vcpu, 28 GB RAM

Results

Platform curTimeMicros64 QueryPerformanceCounter GetCurrentTime GetTickCount64 QueryUnbiasedInterruptTime
Physical 16.273037 15.642062 15.580168 15.486621 14.946076
EC2 49.318997 48.494332 14.757635 14.754738 14.777106
Azure-2008 52.732029 50.263894 45.385883 44.678298 44.314290
Azure-2012 39.428120 37.207951 30.978468 32.044333 32.818515

Summary
It is no surprise that virtual machines are slower then real machines. On the physical machine case, the I expect QPC to cost more then coarse time apis, and it does. Just not by much.

On the Azure platforms, we see slower times for all counters, but the relative time source performance is as expected. Also, overall, 2012 R2 is better then 2008 R2 in this micro benchmark.

The surprising thing is the QPC is significantly slower, almost 4x, then the other time sources on EC2.

Comment by Andy Schwerin [ 24/Jun/15 ]

eitan.klein, why is this not a dupe of SERVER-18613, at least in terms of how it might be fixed?

Comment by Andy Schwerin [ 24/Jun/15 ]

We should use QueryPerformanceCounter on systems where rdtsc is fast, as GetTickCount64 has 10ms resolution (actually, probably 1ms).

Comment by Eitan Klein [ 22/Jun/15 ]

acm

Per our discussion, I think we should use GetTickCount64() API for the tracing methods should be good for our monitor system, and believe it significant faster

Generated at Thu Feb 08 03:48:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.