[SERVER-19233] Performance regression for wiredTiger from build 91c9eaeae7 Created: 30/Jun/15  Updated: 14/Apr/16  Resolved: 09/Jul/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chung-yen Chang Assignee: Geert Bosch
Resolution: Duplicate Votes: 0
Labels: 32qa, mpreg
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File screenshot.png     HTML File server-19233.html    
Issue Links:
Duplicate
duplicates SERVER-19189 Improve performance under high number... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Run the following command in the shell and the "insert" numbers reported are the ones to look. Change the value for parallel for different number of connections, and the value of seconds to adjust the run time.

benchRun( { ops: [{op: "insert", ns: "test.repro19233", doc: {_id:

{"#OID":1}

}, writeCmd : true } ], parallel: 32, seconds: 20 } )

Sprint: Quint Iteration 6
Participants:

 Description   

A performance regression was observed for some of the mongo-perf insert and updates micro-benchmarks since git hash: 91c9eaeae7.

Insert.justID throughput dropped from 64,463 to 42,290. Need to investigate and fix.



 Comments   
Comment by Martin Bligh [ 06/Jul/15 ]

Noted David's results that seems to show that it's all fine until we drop a table then roll the epoch, at which point we seem to stop using the cache and create new cursors (and presumably sessions) all the time.

Seems to be some problem with the counters, which I was planning on killing anyway. David confirmed this fixes it.
https://mongodbcr.appspot.com/670001/

Comment by David Daly [ 06/Jul/15 ]

Attached full timeseries data.

Comment by David Daly [ 06/Jul/15 ]

I did a few experiments this morning, collecting server status information, and doing some profiling. The load actually goes up slightly in the bad configuration.

Here's some interesting parts from the server status information, as run through the timeseries tool:

Time period A-B are the good test period, and C-E are the bad test period. I dropped the collection in the period B-C. Notice also period D-E. I was not expecting to see that when I ran the test.
In period C-D there appears to be much more cursor and session activity, although the A-B and C-E are doing exactly the same test. Is there a reason we should be seeing so many cursor create calls in C-D?

While profiling the data, I see the following in the bad case, but not in the good case:

    9.83%  mongod-performa  mongod-performance_linux_wt_repl_91c9eaeae7dbe3ff5e5e5b369871562beda908a8_15_06_29_01_52_50  [.] __config_next                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
            |
            --- __config_next
               |          
               |--9.26%-- __config_getraw.isra.0
               |          |          
               |          |--4.88%-- __wt_config_getones
               |          |          |          
               |          |          |--3.23%-- __wt_schema_open_table
               |          |          |          __wt_schema_get_table
               |          |          |          __wt_curtable_open
               |          |          |          __wt_open_cursor
               |          |          |          __session_open_cursor
               |          |          |          mongo::WiredTigerSession::getCursor(std::string const&, unsigned long, bool)
               |          |          |          
               |          |           --1.27%-- __wt_schema_open_colgroups
               |          |                     __wt_schema_open_table
               |          |                     __wt_schema_get_table
               |          |                     __wt_curtable_open
               |          |                     __wt_open_cursor
               |          |                     __session_open_cursor
               |          |                     mongo::WiredTigerSession::getCursor(std::string const&, unsigned long, bool)
               |          |          
               |           --4.14%-- __wt_config_get
               |                     __wt_config_gets
               |                     |          
               |                     |--2.50%-- __wt_cursor_init
               |                     |          __wt_curfile_create
               |                     |          __wt_curfile_open
               |                     |          __wt_open_cursor
               |                     |          __wt_curtable_open
               |                     |          __wt_open_cursor
               |                     |          __session_open_cursor
               |                     |          mongo::WiredTigerSession::getCursor(std::string const&, unsigned long, bool)
               |                     |          
               |                     |--0.56%-- __wt_curfile_open
               |                     |          __wt_open_cursor
               |                     |          __wt_curtable_open
               |                     |          __wt_open_cursor
               |                     |          __session_open_cursor
               |                     |          mongo::WiredTigerSession::getCursor(std::string const&, unsigned long, bool)
               |                     |          
               |                     |--0.53%-- __wt_session_get_btree_ckpt
               |                     |          __wt_curfile_open
               |                     |          __wt_open_cursor
               |                     |          __wt_curtable_open
               |                     |          __wt_open_cursor
               |                     |          __session_open_cursor
               |                     |          mongo::WiredTigerSession::getCursor(std::string const&, unsigned long, bool)
               |                     |          
               |                      --0.50%-- __wt_curfile_create
               |                                __wt_curfile_open
               |                                __wt_open_cursor
               |                                __wt_curtable_open
               |                                __wt_open_cursor
               |                                __session_open_cursor
               |                                mongo::WiredTigerSession::getCursor(std::string const&, unsigned long, bool)
               |          
                --0.57%-- __wt_config_next
                          |          
                           --0.56%-- config_check

I found that code in the wiredTiger code base, but don't know anything more there.

I did a couple more experiments also:

  • Removing the syncDelay=0 option
  • Removing the syncDelay=0 and noJournal option

The issue reproduced in both cases.
Is this behavior make sense for the change that went in on the commit 91c9e?

Comment by David Daly [ 03/Jul/15 ]

Not silly at all daveh86
The box is Ubuntu 6 cores (hyperthreading turned off).
Linux ubuntu 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
32 GB of DRAM and 2 SSD.

I downloaded the mongod and mongos from evergreen.

I was going to try some profiling, etc on it and see what was different on the first case compared to the second.

Comment by David Hows [ 03/Jul/15 ]

Might be a silly question, but david.daly can you give a bit more background as to what OS, Hardware, GCC version, GLIBC version, etc.

If OS particulars are a factor the more we know the better.

I've tested locally on my Arch box (GCC 5.1 and Glibc 2.21) w/ 4 cores + 4 HT, 16GB RAM and SSD storage and was unable to reproduce the delay.

Comment by David Daly [ 02/Jul/15 ]

chung-yen.chang
I believe (I'd like to double check) that I did that exact test, and if I drop the collection on the empty database, that I do – not – get the regression on the first try. I definitely get it on the second run.

Comment by Chung-yen Chang [ 02/Jul/15 ]

david.daly, if you do a drop collection right after opening up the shell, does the first benchRun invocation also reports the regression? I wonder if we might be seeing some of the works done by collection drop in the regressed case but not the other.

Comment by David Daly [ 02/Jul/15 ]

The following reproduces reliably on my linux box:

Git hash 91c9eaeae7dbe3ff5e5e5b369871562beda908a8 from evergreen:

Git hash 38f937036b5033bd50a9fd740e897415bd9f21db from evergreen:

I'm still trying to understand better what's going on. If I don't drop the collection in between tests, the performance doesn't drop on that bad case.

Comment by David Daly [ 02/Jul/15 ]

This reproduces on my linux box, but I failed to reproduce it on a RHEL spawnhost. Trying to figure out where the difference is here.

Comment by Alexander Gorrod [ 02/Jul/15 ]

It would be interesting to know if the regression reproduces on a machine
launched with spawn hosts.

Comment by Martin Bligh [ 02/Jul/15 ]

Hmmm ... didn't actually try it yesterday, we just looking at the scalability. I can't repro either .... 24 CPU box.

nojournal 32 threads. 354866 before, 351293 after.
nojournal 128 threads, 354488 before, 353049 after
journal 128 threads, 209860 before, 186434 after

david.daly pointed out that he fixed a typo in the reproduce script at the top yesterday, so if you had it cached somewhere (like I did ...) you need to refresh. (location of write_cmd arg). Still doesn't repro for me though

Comment by Alexander Gorrod [ 02/Jul/15 ]

Hi Martin,

No oplog or journal. I'd expect my machine to be slower than yours. The
numbers I'm seeing are still higher than those on evergreen.

I was surprised that I didn't see the difference between the old and new
versions.

Alex

Comment by Martin Bligh [ 02/Jul/15 ]

Alex, are you running with or without journal and oplog? Seems to make a huge difference in the numbers - I think both off will stress it the most.
Also, your baseline numbers seem slow - possibly I/O bound? I'm getting > 2x what you are with 2 threads.

Comment by Alexander Gorrod [ 02/Jul/15 ]

I attempted to reproduce this on a local c3.4xlarge machine today, and couldn't. I'm seeing:

Thread count 808996 (after degradation) 8910277205b (before degradation)
2 18073 18423
8 54816 55337
16 94115 92309
32 99561 101195

I see very similar performance numbers when running a local build against master.

Comment by David Daly [ 01/Jul/15 ]

Updated repro to fix typo

Comment by Martin Bligh [ 01/Jul/15 ]

Geert is removing the shutdownLock, but I did that by hand (just deleted it) and it doesn't fix the perf.
Presumably the cache spinlock is the issue.

Comment by Geert Bosch [ 01/Jul/15 ]

I'll wipe out the shared_mutex here and condense all sessions out / shutdown state in a single atomic variable. That should take care of this. If the atomic variable is still a bottleneck we could partition it, but I doubt we'll need that.

Comment by Martin Bligh [ 01/Jul/15 ]

Oops, somehow grabbed the wrong binary - rerun numbers edited above. sorry.

Comment by Geert Bosch [ 01/Jul/15 ]

recordGlobalTime was deleted on may 29th

Comment by Martin Bligh [ 01/Jul/15 ]

-   2.19%           mongod  mongod                       [.] __wt_row_search                                                                                       
   - __wt_row_search   
      - 97.69% __wt_btcur_insert      
         - __curfile_insert                    
            - 83.05% mongo::WiredTigerRecordStore::insertRecord(mongo::OperationContext*, char const*, int, bool)                                                  
               + 73.42% mongo::WiredTigerRecordStore::insertRecord(mongo::OperationContext*, mongo::DocWriter const*, bool)                                        
               + 26.58% mongo::Collection::_insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool)                                                  
            + 16.95% mongo::WiredTigerIndexUnique::_insert(__wt_cursor*, mongo::BSONObj const&, mongo::RecordId const&, bool)                                      
      - 2.25% __curfile_insert                                                                                                                                     
         - 80.98% mongo::WiredTigerRecordStore::insertRecord(mongo::OperationContext*, char const*, int, bool)                                                     
            + 52.01% mongo::Collection::_insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool)                                                     
            + 47.99% mongo::WiredTigerRecordStore::insertRecord(mongo::OperationContext*, mongo::DocWriter const*, bool)                                           
         + 19.02% mongo::WiredTigerIndexUnique::_insert(__wt_cursor*, mongo::BSONObj const&, mongo::RecordId const&, bool)                                         
-   2.12%           mongod  libpthread-2.19.so           [.] pthread_mutex_lock                                                                                    
   - pthread_mutex_lock                                                                                                                                            
      - 17.09% boost::unique_lock<boost::mutex>::lock()                                                                                                            
         - 54.37% boost::shared_mutex::lock_shared()                                                                                                               
            - boost::shared_lock<boost::shared_mutex>::lock()                                                                                                      
               + 52.05% mongo::WiredTigerSessionCache::releaseSession(mongo::WiredTigerSession*)                                                                   
               + 47.95% mongo::WiredTigerSessionCache::getSession()                                                                                                
         - 27.97% mongo::repl::BackgroundSync::get()                                                                                                               
              mongo::repl::_logOp(mongo::OperationContext*, char const*, char const*, mongo::BSONObj const&, mongo::BSONObj*, bool)                                
              mongo::OpObserver::onInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::BSONObj, bool)                                           
              mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool, bool)                                                       
              mongo::checkAndInsert(mongo::OperationContext*, mongo::OldClientContext&, char const*, mongo::BSONObj&)                                              
              mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                                       
              mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                                    
              mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                                     
              mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                                   
              start_thread                                                                                                                                         
         - 12.90% boost::shared_mutex::unlock_shared()                                                                                                             
            + 62.69% mongo::WiredTigerSessionCache::releaseSession(mongo::WiredTigerSession*)                                                                      
            + 37.31% mongo::WiredTigerSessionCache::getSession()                                                                                                   
         - 4.76% mongo::repl::ReplicationCoordinatorImpl::setMyLastOptime(mongo::repl::OpTime const&)                                                              
              mongo::repl::_logOp(mongo::OperationContext*, char const*, char const*, mongo::BSONObj const&, mongo::BSONObj*, bool)                                
              mongo::OpObserver::onInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::BSONObj, bool)                                           
              mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool, bool)                                                       
              mongo::checkAndInsert(mongo::OperationContext*, mongo::OldClientContext&, char const*, mongo::BSONObj&)                                              
              mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                                       
              mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                                    
              mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                                     
              mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                                   
              start_thread                                                                                                                                         
 
      - 15.77% __wt_btcur_insert                                                                                                                                   
         - __curfile_insert                                                                                                                                        
            + 69.46% mongo::WiredTigerRecordStore::insertRecord(mongo::OperationContext*, char const*, int, bool)                                                  
            + 30.54% mongo::WiredTigerIndexUnique::_insert(__wt_cursor*, mongo::BSONObj const&, mongo::RecordId const&, bool)                                      
      - 9.02% mongo::ensureShardVersionOKOrThrow(mongo::Client*, std::string const&)                                                                               
         + 51.47% mongo::MigrateFromStatus::logOp(mongo::OperationContext*, char const*, char const*, mongo::BSONObj const&, mongo::BSONObj*, bool)                
         + 48.53% mongo::OldClientContext::_finishInit()                                                                                                           
      - 7.95% mongo::LockerImpl<false>::_unlockImpl(mongo::FastMapNoAlloc<mongo::ResourceId, mongo::LockRequest, 16>::IteratorImpl<mongo::FastMapNoAlloc<mongo::Res
         + mongo::LockerImpl<false>::unlock(mongo::ResourceId)                                                                                                     
      - 7.12% mongo::LockerImpl<false>::lockBegin(mongo::ResourceId, mongo::LockMode)                                                                              
         + 94.02% mongo::LockerImpl<false>::lock(mongo::ResourceId, mongo::LockMode, unsigned int, bool)                                                           
         + 5.98% mongo::LockerImpl<false>::lockGlobalBegin(mongo::LockMode)                                                                                        
      - 5.76% mongo::OldClientContext::~OldClientContext()                                                                                                         
           mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                                          
           mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                                       
           mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                                        
           mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                                      
           start_thread                                                                                                                                            
      - 5.54% mongo::getNextGlobalTimestamp()                                                                                                                      
           mongo::repl::_logOp(mongo::OperationContext*, char const*, char const*, mongo::BSONObj const&, mongo::BSONObj*, bool)                                   
           mongo::OpObserver::onInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::BSONObj, bool)                                              
           mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool, bool)                                                          
           mongo::checkAndInsert(mongo::OperationContext*, mongo::OldClientContext&, char const*, mongo::BSONObj&)                                                 
           mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                                          
           mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                                       
           mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                                        
           mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                                      
           start_thread             
                                         + 5.37% mongo::repl::_logOp(mongo::OperationContext*, char const*, char const*, mongo::BSONObj const&, mongo::BSONObj*, bool)                                
      + 5.37% mongo::WiredTigerRecordStore::dealtWithCappedLoc(mongo::RecordId const&)                                                                             
      + 4.81% mongo::WiredTigerRecordStore::oplogDiskLocRegister(mongo::OperationContext*, mongo::Timestamp const&)                                                
      + 4.30% mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                                       
      + 4.09% mongo::repl::BackgroundSync::getLastAppliedHash() const                                                                                              
      + 3.41% mongo::repl::ReplicationCoordinatorImpl::getConfig() const                                                                                           
      + 1.56% mongo::repl::ReplicationCoordinatorImpl::getMyId() const                                                                                             
      + 1.21% __wt_txn_commit                                                                                                                                      
      + 0.94% mongo::OldClientContext::_finishInit()                                                                                                               
      + 0.54% mongo::repl::BackgroundSync::setLastAppliedHash(long long)              

Comment by Martin Bligh [ 01/Jul/15 ]

Some quick numbers (just desktop single node) threads: inserts
(rerun with correct binary)

1:24437
2:40396
4:71928
8:100433
16:85881
32:84853
64:82032
128:79542

Comment by Martin Bligh [ 01/Jul/15 ]

geert.bosch also had a patch outstanding for this. We might want to see if that fixes the issue

Comment by Michael Cahill (Inactive) [ 30/Jun/15 ]

daveh86, can you please reproduce this slowdown (just the 8 thread case should be sufficient). Then profile with perf and/or pmp to check what is causing the slowdown. If it is mutex contention, let's talk about how to address that without going back to caching N times more sessions than are needed.

Generated at Thu Feb 08 03:50:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.