[SERVER-19189] Improve performance under high number of threads with WT Created: 29/Jun/15  Updated: 19/Sep/15  Resolved: 20/Jul/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 3.0.5, 3.1.6

Type: Task Priority: Major - P3
Reporter: Martin Bligh Assignee: Martin Bligh
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File scalability.png    
Issue Links:
Depends
depends on WT-1989 Improvements to log slot freeing to i... Closed
Duplicate
is duplicated by SERVER-19233 Performance regression for wiredTiger... Closed
Related
is related to SERVER-17194 Low Throughput for YCSB 50-50 workloa... Closed
Backwards Compatibility: Fully Compatible
Backport Completed:
Sprint: Quint Iteration 6
Participants:

 Comments   
Comment by Githook User [ 10/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: Remove counters from WT session cache

(cherry picked from commit 47b6b1a8fd60b0365f72c0550269c0dc0322acfd)
Branch: v3.0
https://github.com/mongodb/mongo/commit/604d172a093c041a53ceedbefb346f300b7b24c5

Comment by Githook User [ 10/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: Checking epoch outside of the lock is racy

can race with epoch++ and end up with us putting an old
epoch session back in the cache.

(cherry picked from commit 2256dfa40676c48788265d241f7fa18ac0d0b322)
Branch: v3.0
https://github.com/mongodb/mongo/commit/9c4fe15f5580c8768e593fff5de8de9dda8fb7d0

Comment by Githook User [ 10/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: dont check sessions is empty outside of the lock

Technically it's illegal to check this outside the lock, though it's heuristic
However, it's the unlikely case and we end up doing the work twice, so
it seems cleaner to remove it and just do it inside the lock. Checked
perf, no measurable difference

(cherry picked from commit 90f72ea0b819335e55a6ddd147d5c8d4afa3c988)
Branch: v3.0
https://github.com/mongodb/mongo/commit/60802d74b1b8df7e194ecc028c8d73e134de791b

Comment by Michael Cahill (Inactive) [ 10/Jul/15 ]

btw martin.bligh, the WT_SESSION::reset call is in MongoDB master, we can try having releaseSession call that on the cached session in the Client in your patch?

Comment by Michael Cahill (Inactive) [ 10/Jul/15 ]

Testing the 3.0 backport here: https://evergreen.mongodb.com/version/559f4aa23ff122536500011c_0

Comment by Githook User [ 10/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: Rename CursorId -> TableId
Branch: master
https://github.com/mongodb/mongo/commit/4f358b55eed54d212aa9d7e52d241999d8ed7293

Comment by Michael Cahill (Inactive) [ 08/Jul/15 ]

When these changes are settled, they should be backported to 3.0. I'm happy to take that on.

Comment by Alexander Gorrod [ 08/Jul/15 ]

martin.bligh I suppose the consequence of removing the counters is that there is no bound on how many sessions are cached. It seems to me as though having a cap is a good idea, maybe we could eventually replace it with a utility thread that maintains the session cache (i.e: removes and closes sessions from the cache if they aren't being regularly used).

Also I've got most of a change around to switch from using a std::vector to a simple linked list for the session cache. I found that it reduces the amount of time the lock is held. Are you still seeing contention on the _cacheLock with the latest version of the code? If so is it worth me updating that patch and sharing it?

Comment by Githook User [ 07/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: Remove counters from WT session cache
Branch: master
https://github.com/mongodb/mongo/commit/47b6b1a8fd60b0365f72c0550269c0dc0322acfd

Comment by Chad Kreimendahl [ 06/Jul/15 ]

I'm not sure on https://github.com/mongodb/mongo/commit/90f72ea0b819335e55a6ddd147d5c8d4afa3c988 why it's "illegal" to check outside of a lock. If it's thread safe (seeing if it's set), then that's actually a good practice to entirely avoid a lock if, in most scenarios, that's going to be set.

Comment by Githook User [ 06/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: WT session cache: use a single compareAndSwap rather than load / store
Branch: master
https://github.com/mongodb/mongo/commit/e39c4e2b6b94783b06f1fcfcaa56246fcd2f4ad1

Comment by Githook User [ 06/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: Checking epoch outside of the lock is racy

can race with epoch++ and end up with us putting an old
epoch session back in the cache.
Branch: master
https://github.com/mongodb/mongo/commit/2256dfa40676c48788265d241f7fa18ac0d0b322

Comment by Githook User [ 06/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: dont check sessions is empty outside of the lock

Technically it's illegal to check this outside the lock, though it's heuristic
However, it's the unlikely case and we end up doing the work twice, so
it seems cleaner to remove it and just do it inside the lock. Checked
perf, no measurable difference
Branch: master
https://github.com/mongodb/mongo/commit/90f72ea0b819335e55a6ddd147d5c8d4afa3c988

Comment by Bruce Lucas (Inactive) [ 03/Jul/15 ]

Further potential improvement in WT journaling here. Performance with journaling enabled at high thread count improved 3x over base 3.0.4, peak performance improved 15%. Still however pending review and testing for functional correctness.

Comment by Githook User [ 02/Jul/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-19189: Use vector instead of list for WT session cache
Branch: master
https://github.com/mongodb/mongo/commit/215dbca62127119de85903df91fb6b106eb5f24a

Comment by Bruce Lucas (Inactive) [ 02/Jul/15 ]

On WT-1989 I describe an experimental patch to the WT journal that seems to help significantly with this, doubling performance at high thread counts.

Comment by Martin Bligh [ 02/Jul/15 ]

No oplogStones patch - this was virgin top of tree.
Agreed the caching is top of the list, this is just poking at scalability in general to look at fixing ... seems to pretty much confirm that the current issue is the top offender.

Comment by Michael Cahill (Inactive) [ 02/Jul/15 ]

martin.bligh, I'm still absorbing this – one question, are the oplog numbers quoted here with the oplogStones changes from SERVER-18875 applied?

Happy to drill into bottlenecks in logging, but it seems like we should get the session caching layer right first.

Comment by Martin Bligh [ 01/Jul/15 ]

We were discussing converting the spinlock to a stdx::mutex, but that's actually slower (a little surprising since it seems heavily contended, but if hold time < 2 context switches, probably makes sense).

Comment by Martin Bligh [ 01/Jul/15 ]

Tried the column store thing for amusement, but it actually seems slower on this:

1:54370
2:89508
4:147221
8:223299
16:281690
32:309800
64:306034
128:275986

Comment by Martin Bligh [ 01/Jul/15 ]

geert.bosch michael.cahill
Big impactors seem to be:

1. Stats in ~OldClientContext
2. Shutdown lock in WiredTigerSessionCache
3. cache_lock in WiredTigerSessionCache

I've tried killing 1 & 2, we just hit the next bottlenecks, but will keep applying hatchet.

Comment by Martin Bligh [ 01/Jul/15 ]

Overall perf numbers with 128 threads. With oplog => single node replica set

No oplog, no journal: 312943
No oplog, with journal: 177324
Oplog, no journal: 80718
Oplog, with journal: 79224

Note not much difference from journal vs not with oplog => oplog is the bigger bottleneck

Scaling, no oplog, no journal

1:58340
2:79700
4:142619
8:251049
16:316905
32:353179
64:343299
128:314519

Scaling, no oplog, journal

1:36069
2:45462
4:99941
8:175312
16:202750
32:151328
64:161382
128:167500

Comment by Martin Bligh [ 01/Jul/15 ]

w/o journal, confirming the sched_yield disappears.

- 100.00%  mongod  [kernel.kallsyms]  [k] __schedule                                                                                                               
   - __schedule                                                                                                                                                    
      - 98.22% schedule                                                                                                                                            
         - 97.74% futex_wait_queue_me                                                                                                                              
              futex_wait                                                                                                                                           
              do_futex                                                                                                                                             
              sys_futex                                                                                                                                            
            - system_call_fastpath                                                                                                                                 
               - 99.81% __lll_lock_wait                                                                                                                            
                  - 92.65% mongo::repl::_logOp(mongo::OperationContext*, char const*, char const*, mongo::BSONObj const&, mongo::BSONObj*, bool)                   
                       mongo::OpObserver::onInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::BSONObj, bool)                                  
                       mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool, bool)                                              
                       mongo::checkAndInsert(mongo::OperationContext*, mongo::OldClientContext&, char const*, mongo::BSONObj&)                                     
                       mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                              
                       mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                           
                       mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                            
                       mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                          
                       start_thread                                                                                                                                
                  - 4.56% mongo::WiredTigerRecordStore::dealtWithCappedLoc(mongo::RecordId const&)                                                                 
                       mongo::WiredTigerRecoveryUnit::_commit()                                                                                                    
                       mongo::checkAndInsert(mongo::OperationContext*, mongo::OldClientContext&, char const*, mongo::BSONObj&)                                     
                       mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                              
                       mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                           
                       mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                            
                       mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                          
                       start_thread                                                                                                                                
                  - 1.57% mongo::OldClientContext::~OldClientContext()                                                                                             
                       mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                              
                       mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                           
                       mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                            
                       mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                          
                       start_thread                                                                

And without oplog / rs:

 
- 100.00%  mongod  [kernel.kallsyms]  [k] __schedule                                                                                                               
   - __schedule                                                                                                                                                    
      - 98.80% sys_sched_yield                                                                                                                                     
           system_call_fastpath                                                                                                                                    
         - __sched_yield                                                                                                                                           
            - 96.98% __wt_log_write                                                                                                                                
                 __wt_txn_commit                                                                                                                                   
               - __session_commit_transaction                                                                                                                      
                  - 100.00% mongo::WiredTigerRecoveryUnit::_txnClose(bool)                                                                                         
                       mongo::WiredTigerRecoveryUnit::_commit()                                                                                                    
                       mongo::checkAndInsert(mongo::OperationContext*, mongo::OldClientContext&, char const*, mongo::BSONObj&)                                     
                       mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                              
                     - mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                           
                        - 100.00% mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                 
                             mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                    
                             start_thread                                                                                                                          
              1.68% start_thread                                                                                                                                   
            - 0.88% mongo::WiredTigerSessionCache::releaseSession(mongo::WiredTigerSession*)                                                                       
                 mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit()                                                                                          
                 mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit()                                                                                          
                 mongo::OperationContextImpl::~OperationContextImpl()                                                                                              
                 mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                                  
                 mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                                
                 start_thread                                                                                                                                      
      - 1.14% schedule                                                                                                                                             
         + 68.07% futex_wait_queue_me                                                                                                                              
         + 29.57% schedule_hrtimeout_range_clock                                                                                                                   
         + 2.30% schedule_timeout               

Comment by Martin Bligh [ 01/Jul/15 ]

michael.cahill looks like we still have a lot of sched_yield that's causing far more context switches than the cache mutex
This with:

benchRun( { ops: [{op: "insert", ns: "test.repro", doc: {_id: {"#OID":1} } } ], parallel: 128, seconds: '$SECONDS', writeCmd: true } ).insert

Context switch profile (ie perf -e context-switches)

- 100.00%  mongod  [kernel.kallsyms]  [k] __schedule                                                                                                               
   - __schedule                                                                                                                                                    
      - 85.26% sys_sched_yield                                                                                                                                     
           system_call_fastpath                                                                                                                                    
         - __sched_yield                                                                                                                                           
            - 96.64% __wt_log_write                                                                                                                                
                 __wt_txn_commit                                                                                                                                   
                 __session_commit_transaction                                                                                                                      
                 mongo::WiredTigerRecoveryUnit::_txnClose(bool)                                                                                                    
                 mongo::WiredTigerRecoveryUnit::_commit()                                                                                                          
                 mongo::checkAndInsert(mongo::OperationContext*, mongo::OldClientContext&, char const*, mongo::BSONObj&)                                           
                 mongo::receivedInsert(mongo::OperationContext*, mongo::NamespaceString const&, mongo::Message&, mongo::CurOp&)                                    
                 mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)                                 
                 mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                                  
                 mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                                
                 start_thread                                                                                                                                      
            - 2.38% mongo::WiredTigerSessionCache::releaseSession(mongo::WiredTigerSession*)                                                                       
                 mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit()                                                                                          
                 mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit()                                                                                          
                 mongo::OperationContextImpl::~OperationContextImpl()                                                                                              
                 mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*)                                                                  
                 mongo::PortMessageServer::handleIncomingMsg(void*)                                                                                                
                 start_thread                                                                                                                                      
              0.56% start_thread                                                                                                                                   
      - 14.31% schedule                                                                                                                                            
         - 87.67% futex_wait_queue_me                                                                                                                              
              futex_wait                                                                                                                                           
              do_futex                                                                                                                                             
              sys_futex                                                                                                                                            
            - system_call_fastpath                                                                                                                                 
               - 99.82% __lll_lock_wait                                                                                                                            
                  + 86.34% mongo::repl::_logOp(mongo::OperationContext*, char const*, char const*, mongo::BSONObj const&, mongo::BSONObj*, bool)                   
                  + 6.35% mongo::WiredTigerRecordStore::dealtWithCappedLoc(mongo::RecordId const&)                                                                 
                  + 4.23% mongo::OldClientContext::~OldClientContext()                                                                                             
                  + 0.54% __wt_btcur_insert                                                                                                                        
         - 11.51% schedule_hrtimeout_range_clock                                                                                                                   
              schedule_hrtimeout_range                                                                                                                             
              poll_schedule_timeout                                                                                                                                
              do_select                                                                                                                                            
              core_sys_select                                                                                                                                      
              sys_select                                                                                                                                           
              system_call_fastpath                                                                                                                                 
            - __select                                                                                                                                             
               + 91.66% __wt_log_slot_wait                                                                                                                         
               + 7.29% __wt_page_in_func                                                                                                                           
                 1.05% mongo::_initAndListen(int)                                                                                                                  
         + 0.76% schedule_timeout                              

Comment by David Daly [ 30/Jun/15 ]

This ticket should help SERVER-17194.

Generated at Thu Feb 08 03:50:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.