[SERVER-26674] Over 25% regression on mongodb using YCSB workload Created: 18/Oct/16  Updated: 19/Nov/16  Resolved: 08/Nov/16

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 3.3.10, 3.3.12, 3.3.15, 3.4.0-rc0
Fix Version/s: 3.4.0-rc3

Type: Bug Priority: Major - P3
Reporter: Lilian Romero Assignee: Samantha Ritter (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-26944 Make transport::Session decorable Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

use YCSB 0.6.0 workload with 12 threads

Sprint: Platforms 2016-10-31, Platforms 2016-11-21
Participants:

 Description   

Over 25% regression on mongodb after release 3.3.9.
Workload used: YCSB 0.6.0
Mongodb downloaded from mongodb website.
perf shows that over 22.98% is spent on _raw_spin_lock
stack trace is from 3.4.0.1.rc0

#0  0x00003fffa6e11b18 in __lll_lock_wait () from /lib64/power8/libpthread.so.0
#1  0x00003fffa6e0b004 in pthread_mutex_lock () from /lib64/power8/libpthread.so.0
#2  0x000000002305e3c8 in void mongo::transport::TransportLayerManager::_foreach<mongo::transport::TransportLayerManager::sessionStats()::{lambda(mongo::transport::TransportLayer*)#1}>(mongo::transport::TransportLayerManager::sessionStats()::{lambda(mongo::transport::TransportLayer*)#1}&&) [clone .constprop.22] ()
#3  0x000000002305e5f4 in mongo::transport::TransportLayerManager::sessionStats() ()
#4  0x000000002311552c in mongo::(anonymous namespace)::threadStateChange() ()
#5  0x00000000230ccff8 in mongo::markThreadIdle() ()
#6  0x000000002203e140 in mongo::ServiceEntryPointMongod::_sessionLoop(mongo::transport::Session*) ()
#7  0x000000002203ea30 in std::_Function_handler<void (mongo::transport::Session*), mongo::ServiceEntryPointMongod::startSession(mongo::transport::Session&&)::{lambda(mongo::transport::Session*)#1}>::_M_invoke(std::_Any_data const&, mongo::transport::Session*&&) ()
#8  0x0000000023057a30 in mongo::(anonymous namespace)::runFunc(void*) ()
#9  0x00003fffa6e08728 in start_thread () from /lib64/power8/libpthread.so.0
#10 0x00003fffa6d39ec0 in clone () from /lib64/power8/libc.so.6

Here are the results.

Mongo version Throughput
3.3.8 294363.31 ops/sec
3.3.8 294640.07 ops/sec
3.3.9 281662.31 ops/sec
3.3.9 281137.29 ops/sec
3.3.10 222114.44 ops/sec
3.3.12 200225.45 ops/sec
3.3.15 197709.52 ops/sec
3.4 199829.95 ops/sec


 Comments   
Comment by Samantha Ritter (Inactive) [ 08/Nov/16 ]

Hi lilianr@us.ibm.com,

Thank you for reporting this performance regression. I've done some refactoring in our TransportLayer code so we can avoid time spent under locks. Also, I have reduced the number of lookups we need to do to access our connection objects.

Running a similar YCSB workload to yours by hand on my machine, I am seeing significantly higher throughput with this fix. If you get the chance to run your particular workload on the same server that you were using before, I'd be very interested to see what results you get. For now, I am going to close this ticket, and the patch will be going into 3.4.0-rc3. If you observe other regressions or if your workload does not perform better with this fix, please do open another ticket.

Thanks,
Samantha

Comment by Githook User [ 08/Nov/16 ]

Author:

{u'username': u'samantharitter', u'name': u'samantharitter', u'email': u'samantha.ritter@10gen.com'}

Message: SERVER-26674 Remove uses of locks in TransportLayerLegacy

Comment by Githook User [ 08/Nov/16 ]

Author:

{u'username': u'samantharitter', u'name': u'samantharitter', u'email': u'samantha.ritter@10gen.com'}

Message: SERVER-26674 Add checked_pointer_cast
Branch: master
https://github.com/mongodb/mongo/commit/158db044bb21a2ff39c2984a35850a28c9572c8d

Comment by Githook User [ 06/Nov/16 ]

Author:

{u'username': u'samantharitter', u'name': u'samantharitter', u'email': u'samantha.ritter@10gen.com'}

Message: SERVER-26674 transport::Session objects should be shared_ptr managed
Branch: master
https://github.com/mongodb/mongo/commit/0ac04999faae1d2fc0e10972aaf21082a2e48c8f

Comment by Githook User [ 06/Nov/16 ]

Author:

{u'username': u'samantharitter', u'name': u'samantharitter', u'email': u'samantha.ritter@10gen.com'}

Message: SERVER-26674 Add missing header includes
Branch: master
https://github.com/mongodb/mongo/commit/2d1dd9e07a40f314853e29bffb56b45bf21df940

Comment by Daniel Pasette (Inactive) [ 25/Oct/16 ]

We're not seeing this on other platforms, but looks likely to be a result of transport layer changes during that time frame. Thanks for the report.

Comment by Jim Van Fleet [ 24/Oct/16 ]

from pert_tree.prof (perf record -g ; perf report -vg ) – a pretty clear trail from _raw_spin_lock back to mongo::transport::TransportLayerLegacy::_runTicket

I've seen the sharding and cursor manager locks be hot in the past – I think this new one is overshadowing them – note in the data below that the cursorManager and sharding are calling pthread_mutex_lock, but the percentages are pretty low –

 
    23.52%    23.52%  mongod           [kernel.kallsyms]             [k]  _raw_spin_lock
                        |--53.59%-- futex_wait_setup
                        |--45.64%-- futex_wake
                         --0.77%-- [...]
 
    13.95%     1.21%  mongod           [kernel.kallsyms]             [k]  futex_wait_setup
                        |--61.64%-- _raw_spin_lock
                        |--28.90%-- __lll_lock_wait
                        |--8.45%-- futex_wait_setup
                        |--0.74%-- futex_wait
                         --0.28%-- [...]
 
     5.73%     0.30%  mongod           libpthread-2.17.so            [.]  __lll_lock_wait
                        |--94.81%-- pthread_mutex_lock
                        |--3.61%-- mongo::transport::TransportLayerLegacy::_runTicket
                         --1.59%-- [...]
 
 
    16.91%     0.61%  mongod           libpthread-2.17.so            [.]  pthread_mutex_lock
                        |--90.13%-- mongo::transport::TransportLayerLegacy::_runTicket
                        |--2.08%-- mongo::ShardingState::getNS
                        |--1.36%-- mongo::transport::TransportLayerLegacy::sessionStats
                        |--0.83%-- mongo::CursorManager::deregisterExecutor
                        |--0.78%-- mongo::CursorManager::registerExecutor
                        |--0.66%-- mongo::Top::incrementGlobalLatencyStats
                        |--0.64%-- mongo::Top::record
                         --3.52%-- [...]

Comment by Chung-yen Chang [ 19/Oct/16 ]

lilianr@us.ibm.com, thanks for the information. We are looking into this and will get back to you with our findings.

Comment by Lilian Romero [ 19/Oct/16 ]

here are the options used:
recordcount=1000000
operationcount=10000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0
requestdistribution=zipfian

Comment by Chung-yen Chang [ 19/Oct/16 ]

lilianr@us.ibm.com, thanks for sharing the data with us. Would you please tell me what is the read/update mix of the YCSB workload that you used? YCSB has multiple sub-workloads with different read/update mix and I am trying to see if this could be related to a particular op type. Thanks.

Comment by Lilian Romero [ 18/Oct/16 ]

The workload used is YCSB 0.6.0. Number of threads used: 12.
This was run on a Power server with 20 cores, standalone.

Comment by Ramon Fernandez Marina [ 18/Oct/16 ]

Thanks for the report Lilian, we'll take a look.

Can you please provide

  • additional information on the exact YCSB workload you used and how are you running it
  • the specs of the hardware you're using
  • information about the deployment (stand-alone, replica set, etc.)

?

Thanks,
Ramón.

Comment by Lilian Romero [ 18/Oct/16 ]

stack trace:

#0  0x00003fffa6e0c5b8 in pthread_mutex_unlock () from /lib64/power8/libpthread.so.0
#1  0x000000002305caf8 in mongo::transport::TransportLayerLegacy::_runTicket(mongo::transport::Ticket) ()
#2  0x000000002305cef0 in mongo::transport::TransportLayerLegacy::wait(mongo::transport::Ticket&&) ()
#3  0x0000000023058ea0 in _ZNO5mongo9transport6Ticket4waitEv ()
#4  0x000000002203df00 in mongo::ServiceEntryPointMongod::_sessionLoop(mongo::transport::Session*) ()
#5  0x000000002203ea30 in std::_Function_handler<void (mongo::transport::Session*), mongo::ServiceEntryPointMongod::startSession(mongo::transport::Session&&)::{lambda(mongo::transport::Session*)#1}>::_M_invoke(std::_Any_data const&, mongo::transport::Session*&&) ()
#6  0x0000000023057a30 in mongo::(anonymous namespace)::runFunc(void*) ()
#7  0x00003fffa6e08728 in start_thread () from /lib64/power8/libpthread.so.0
#8  0x00003fffa6d39ec0 in clone () from /lib64/power8/libc.so.6

Generated at Thu Feb 08 04:12:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.