[SERVER-13382] shared_ptr uses a lot of CPU time during external sort (foreground index build) Created: 27/Mar/14  Updated: 14/Aug/19  Resolved: 14/Aug/19

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Performance
Affects Version/s: 2.6.0-rc2
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Mark Callaghan Assignee: Louis Williams
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Sprint: Execution Team 2019-08-26
Participants:

 Description   

The database is larger than RAM and I am creating an index in the foreground. The shared_ptr destructor was frequently active when I used call stack sampling. The shared_count destructor used by shared_ptr was the #3 CPU consumer per Linux perf.

I don't have much experience with modern C++. My experience with shared_ptr is mostly limited to finding that it uses too much CPU. Is the destructor getting called frequently because the copy constructor was invoked frequently?

    13.46%   mongod  mongod               [.] mongo::BSONObjIterator::next()                                                                                                                                                  
    11.56%   mongod  mongod               [.] mongo::BSONElement::size() const                                                                                                                                                            
     9.65%   mongod  mongod               [.] boost::detail::shared_count::~shared_count()                                                                                                                                              
     9.10%   mongod  mongod               [.] mongo::compareElementValues(mongo::BSONElement const&, mongo::BSONElement const&)                                                                                      
     6.40%   mongod  mongod               [.] mongo::BSONObj::woCompare(mongo::BSONObj const&, mongo::Ordering const&, bool) const                                                                      
     5.50%   mongod  libc-2.13.so         [.] __strlen_sse42    

An example is:

boost::detail::shared_count::~shared_count,~shared_ptr,~ComparatorWithInterruptCheck,~STLComparator,std::pop_heap<__gnu_cxx::__normal_iterator<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj,,mongo::sorter::MergeIterator<mongo::BSONObj,,mongo::BtreeBulk::commit<mongo::BtreeData_V1>,mongo::BtreeBasedAccessMethod::commitBulk,mongo::buildAnIndex,mongo::IndexCatalog::createIndex,mongo::CmdCreateIndex::run,mongo::_execCommand,mongo::Command::execCommand,mongo::_runCommands,runCommands,mongo::newRunQuery,receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone

And an example full stack trace is:

#0  boost::detail::shared_count::~shared_count (this=0x7f634a32e150, __in_chrg=<optimized out>) at src/third_party/boost/boost/smart_ptr/detail/shared_count.hpp:305
#1  0x00000000009d5c49 in ~shared_ptr (this=0x7f634a32e148, __in_chrg=<optimized out>) at src/third_party/boost/boost/smart_ptr/shared_ptr.hpp:164
#2  ~ComparatorWithInterruptCheck (this=0x7f634a32e140, __in_chrg=<optimized out>) at src/mongo/db/extsort.cpp:41
#3  ~STLComparator (this=0x7f634a32e140, __in_chrg=<optimized out>) at src/mongo/db/sorter/sorter.cpp:359
#4  std::pop_heap<__gnu_cxx::__normal_iterator<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream>*, std::vector<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream>, std::allocator<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream> > > >, mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::STLComparator>(__gnu_cxx::__normal_iterator<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream>*, std::vector<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream>, std::allocator<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream> > > >, __gnu_cxx::__normal_iterator<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream>*, std::vector<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream>, std::allocator<boost::shared_ptr<mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::Stream> > > >, mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::STLComparator) (__first=..., __last=..., __comp=...) at /mnt/gvfs/third-party/d28c90311ca14f9f0b2bb720f4e34b285513d4f4/gcc-4.6.2-glibc-2.13/libgcc/libgcc-4.6.2/b4e37bb/include/c++/4.6.2/bits/stl_heap.h:362#5  0x00000000009d6ade in mongo::sorter::MergeIterator<mongo::BSONObj, mongo::DiskLoc, mongo::{anonymous}::ComparatorWithInterruptCheck>::next(void) (this=0x36d8b00) at src/mongo/db/sorter/sorter.cpp:325#6  0x0000000000a334c3 in mongo::BtreeBulk::commit<mongo::BtreeData_V1> (this=0x1b23540, dupsToDrop=0x7f634a32e660, op=0x36ed000, mayInterrupt=true) at src/mongo/db/index/btree_based_access_method.cpp:406#7  0x0000000000a31cef in mongo::BtreeBasedAccessMethod::commitBulk (this=0x370e4e0, bulkRaw=0x1b23540, mayInterrupt=true, dupsToDrop=0x7f634a32e660) at src/mongo/db/index/btree_based_access_method.cpp:555#8  0x00000000008a31f4 in mongo::buildAnIndex (collection=0x3704680, btreeState=0x36f1700, mayInterrupt=true) at src/mongo/db/catalog/index_create.cpp:266#9  0x000000000089efef in mongo::IndexCatalog::createIndex (this=0x3704700, spec=..., mayInterrupt=true, shutdownBehavior=mongo::IndexCatalog::SHUTDOWN_CLEANUP) at src/mongo/db/catalog/index_catalog.cpp:334#10 0x00000000008cfa0f in mongo::CmdCreateIndex::run (this=<optimized out>, dbname=..., cmdObj=..., options=<optimized out>, errmsg=..., result=..., fromRepl=false) at src/mongo/db/commands/create_indexes.cpp:176#11 0x000000000095d7bf in mongo::_execCommand (c=<optimized out>, dbname=..., cmdObj=..., queryOptions=<optimized out>, errmsg=..., result=..., fromRepl=false) at src/mongo/db/dbcommands.cpp:1357#12 0x000000000095e2af in mongo::Command::execCommand (c=0x193fe60 <mongo::cmdCreateIndex>, client=..., queryOptions=0, cmdns=0x3189414 "iibench.$cmd", cmdObj=..., result=..., fromRepl=false) at src/mongo/db/dbcommands.cpp:1544#13 0x000000000095eded in mongo::_runCommands (ns=0x3189414 "iibench.$cmd", _cmdobj=..., b=..., anObjBuilder=..., fromRepl=false, queryOptions=0) at src/mongo/db/dbcommands.cpp:1648#14 0x0000000000b7e0cc in runCommands (queryOptions=<optimized out>, anObjBuilder=..., b=..., curop=..., jsobj=..., ns=0x3189414 "iibench.$cmd", fromRepl=<optimized out>) at src/mongo/db/query/new_find.cpp:107#15 mongo::newRunQuery (m=..., q=..., curop=..., result=...) at src/mongo/db/query/new_find.cpp:384#16 0x0000000000a57860 in receivedQuery (m=..., dbresponse=..., c=...) at src/mongo/db/instance.cpp:269#17 mongo::assembleResponse (m=..., dbresponse=..., remote=...) at src/mongo/db/instance.cpp:433#18 0x00000000007a7d02 in mongo::MyMessageHandler::process (this=<optimized out>, m=..., port=0x1b22f50, le=0x1b23c70) at src/mongo/db/db.cpp:204#19 0x0000000000e40dd7 in mongo::PortMessageServer::handleIncomingMsg (arg=0x36c4d20) at src/mongo/util/net/message_server_port.cpp:209#20 0x00007f6349f03f81 in start_thread (arg=0x7f634a331700) at pthread_create.c:301#21 0x00007f634928185d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115



 Comments   
Comment by Louis Williams [ 14/Aug/19 ]

This issue has gone away. As seen in a local perf run:

+   19.75%  IndexBu.ongod-0  mongod              [.] mongo::BSONObj::woCompare                                                                                                                                                                                                                                               
+    7.08%  IndexBu.ongod-0  mongod              [.] mongo::BSONElement::BSONElement                                                                                                                                                                                                                                         
+    6.12%  IndexBu.ongod-0  mongod              [.] mongo::BSONElement::computeSize                                                                                                                                                                                                                                         
+    5.70%  IndexBu.ongod-0  mongod              [.] mongo::BSONElement::compareElements                                                                                                                                                                                                                                     
+    5.59%  IndexBu.ongod-0  libc-2.29.so        [.] __strlen_avx2                                                                                                                                                                                                                                                           
+    3.63%  IndexBu.ongod-0  mongod              [.] mongo::BSONElement::woCompare                                                                                                                                                                                                                                           

Here's at least one reason why:

The external sorter today depends on copy-assignment of the Value type (i.e. BSONObj), which can be expensive if the copy assignment is expensive. BSONObj is backed by a ref-countable SharedBuffer, which wraps boost::intrusive_ptr instead of shared_ptr. This is less expensive than copying an entire buffer, and also faster than shared_ptr. Because intrusive_ptr uses atomic instructions, a standard copy assignment will perform one atomic increment to make a copy of the buffer, then an atomic decrement when the original object is destructed.

The new assignment operator for BSONObj quite cleverly avoids these atomic instructions by using a swap operation when assigning from a non-const BSONObj.

Comment by Andy Schwerin [ 02/Apr/14 ]

This is happening because the "next stream to merge" is selected by popping out of a heap of shared pointers to streams. This operation occurs with high frequency, and there's no need for the heap to store shared pointers to the streams. The ownership could instead be maintained elsewhere.

Generated at Thu Feb 08 03:31:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.