[SERVER-20174] WT snapshot threads significantly impact performance Created: 28/Aug/15  Updated: 06/Dec/22  Resolved: 28/Sep/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Martin Bligh Assignee: Backlog - Storage Execution Team
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File benchRun     HTML File benchSet     HTML File md_rs2     HTML File md_stop     HTML File pymongo_init_rs    
Issue Links:
Duplicate
is duplicated by SERVER-20201 Performance drop over time when runni... Closed
Related
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:

 Description   

Under heavy insert load on a 2-node replica set, WT eviction appears to hang on the secondary.
Situation was both nodes running on a 24-cpu box, 32GB RAM, pushing 16 threads with vectored writes of 100 tiny documents at a time

Per Michael, this seems to be related to a snapshot pinning down the memory

-  24.71%  mongod  mongod               [.] __rec_txn_read                                        
   - __rec_txn_read                                                                                
      - 99.65% __rec_row_leaf_insert                                                              
           __wt_reconcile                                                                          
           __wt_evict                                                                              
           __wt_evict_page                                                                        
           __wt_page_in_func                                                                      
           __wt_row_search                                                                        
           __wt_btcur_insert                                                                      
           __curfile_insert                                                                        
           mongo::WiredTigerIndexUnique::_insert(__wt_cursor*, mongo::BSONObj const&, mongo::Record
           mongo::WiredTigerIndex::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo::R
           mongo::IndexAccessMethod::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo:
           mongo::IndexCatalog::_indexFilteredRecords(mongo::OperationContext*, mongo::IndexCatalog
           mongo::IndexCatalog::_indexRecords(mongo::OperationContext*, mongo::IndexCatalogEntry*,
           mongo::IndexCatalog::indexRecords(mongo::OperationContext*, std::vector<mongo::BsonRecor
           mongo::Collection::_insertDocuments(mongo::OperationContext*, __gnu_cxx::__normal_iterat
           mongo::Collection::insertDocuments(mongo::OperationContext*, __gnu_cxx::__normal_iterato
           mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::applyOperation_inlock(mongo::OperationContext*, mongo::Database*, mongo::BS
           std::_Function_handler<mongo::Status (mongo::OperationContext*, mongo::Database*, mongo:
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool)
           mongo::repl::multiSyncApply(std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> >
           mongo::ThreadPool::_doOneTask(std::unique_lock<std::mutex>*)                            
           mongo::ThreadPool::_consumeTasks()                                                      
           mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&)            
           0x7f957762ea40                                                                          
-  22.88%  mongod  mongod               [.] __wt_row_modify                                        
   - __wt_row_modify                                                                              
      - 99.42% __split_multi_inmem                                                                
           __wt_split_rewrite                                                                      
           __wt_evict                                                                              
           __wt_evict_page                                                                        
           __wt_page_in_func                                                                      
           __wt_row_search                                                                        
           __wt_btcur_insert                                                                      
           __curfile_insert                                                                        
           mongo::WiredTigerIndexUnique::_insert(__wt_cursor*, mongo::BSONObj const&, mongo::Record
           mongo::WiredTigerIndex::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo::R
           mongo::IndexAccessMethod::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo:
           mongo::IndexCatalog::_indexFilteredRecords(mongo::OperationContext*, mongo::IndexCatalog
           mongo::IndexCatalog::_indexRecords(mongo::OperationContext*, mongo::IndexCatalogEntry*,
           mongo::IndexCatalog::indexRecords(mongo::OperationContext*, std::vector<mongo::BsonRecor
           mongo::Collection::_insertDocuments(mongo::OperationContext*, __gnu_cxx::__normal_iterat
           mongo::Collection::insertDocuments(mongo::OperationContext*, __gnu_cxx::__normal_iterato
           mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::applyOperation_inlock(mongo::OperationContext*, mongo::Database*, mongo::BS
           std::_Function_handler<mongo::Status (mongo::OperationContext*, mongo::Database*, mongo:
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool)
           mongo::repl::multiSyncApply(std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> >
           mongo::ThreadPool::_doOneTask(std::unique_lock<std::mutex>*)                            
           mongo::ThreadPool::_consumeTasks()                                                      
           mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&)            
           0x7f957762ea40                                                                          
      + 0.58% __wt_split_rewrite                                                                  
-  21.43%  mongod  mongod               [.] __wt_row_search                                        
   - __wt_row_search                                                                              
      - 99.71% __split_multi_inmem                                                                
           __wt_split_rewrite                                                                      
           __wt_evict                                                                              
           __wt_evict_page                                                                        
           __wt_page_in_func                                                                      
           __wt_row_search                                                                        
           __wt_btcur_insert                                                                      
           __curfile_insert                                                                        
           mongo::WiredTigerIndexUnique::_insert(__wt_cursor*, mongo::BSONObj const&, mongo::Record
           mongo::WiredTigerIndex::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo::R
           mongo::IndexAccessMethod::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo:
           mongo::IndexCatalog::_indexFilteredRecords(mongo::OperationContext*, mongo::IndexCatalog
           mongo::IndexCatalog::_indexRecords(mongo::OperationContext*, mongo::IndexCatalogEntry*,
           mongo::IndexCatalog::indexRecords(mongo::OperationContext*, std::vector<mongo::BsonRecor
           mongo::Collection::_insertDocuments(mongo::OperationContext*, __gnu_cxx::__normal_iterat
           mongo::Collection::insertDocuments(mongo::OperationContext*, __gnu_cxx::__normal_iterato
           mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::applyOperation_inlock(mongo::OperationContext*, mongo::Database*, mongo::BS
           std::_Function_handler<mongo::Status (mongo::OperationContext*, mongo::Database*, mongo:
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool,
           mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool)
           mongo::repl::multiSyncApply(std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> >
           mongo::ThreadPool::_doOneTask(std::unique_lock<std::mutex>*)                            
           mongo::ThreadPool::_consumeTasks()                                                      
           mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&)    



 Comments   
Comment by Eric Milkie [ 28/Sep/17 ]

Snapshot threads are being removed by SERVER-30638; follow that ticket for updates.

Comment by Martin Bligh [ 26/Oct/15 ]

I'll attach a compounded script "benchSet" - if you run it for longer than about 10s without disabling snapshots, it hangs.

Comment by Michael Cahill (Inactive) [ 15/Oct/15 ]

keith.bostic, it would be good to understand what is causing those slow operations on the primary as well. I'm hoping that this is no longer reproducing because snapshots are no longer keeping transaction IDs pinned...

Comment by Keith Bostic (Inactive) [ 14/Oct/15 ]

martin.bligh, I got back to this one today; I did my testing with an AWS c3.4xlarge system, so 16 vCPU.

I continue to see dropouts on the primary (about 200 out of a total of 6000 reporting periods), I'm only seeing two dropouts on the secondary in that same period.

As I understand it, the concern with this ticket was the secondary dropouts, which I'm no longer seeing. Can you verify if you're still seeing dropouts on the secondary?

michael.cahill, should we be looking at dropouts on the primary, too, or are they currently expected in this test?

Comment by Keith Bostic (Inactive) [ 17/Sep/15 ]

martin.bligh, just to let you know, using the wiredtiger develop branch, I ran this job to completion – lots of stalls and periods of low inserts, but it never stopped.

I'm out of pocket for the next couple of days, but I'll dig in deeper and understand the load in more detail soon.

Comment by Michael Cahill (Inactive) [ 01/Sep/15 ]

BTW, keith.bostic, this is a case where the lookaside changes will be exercised, so it would be good to see what happens with MongoDB master + WT develop, once develop is stable.

Comment by Martin Bligh [ 31/Aug/15 ]

Yup, I think that workaround avoids the issue.

Comment by Michael Cahill (Inactive) [ 31/Aug/15 ]

keith.bostic, I have scripts from martin.bligh to run this workload – I'll attach them here. Can you please take a look?

martin.bligh, did you try dan@10gen.com's suggestion of running mongod --setParameter=enableReplSnapshotThread=false? It would be good to know whether that worked around the issue.

In one window:

md_stop && md_rs2 && benchRun 16 false true 3000

And in another:

mongostat --discover

Generated at Thu Feb 08 03:53:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.