[SERVER-16065] Long flush pauses in MMAPv1 Created: 11/Nov/14  Updated: 23/Jan/15  Resolved: 21/Jan/15

Status: Closed
Project: Core Server
Component/s: Concurrency, Storage
Affects Version/s: None
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Davide Italiano Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File flushlongpause.png    
Issue Links:
Duplicate
is duplicated by SERVER-16482 Decreased throughput due to changes i... Closed
Related
is related to SERVER-16067 Contention on lockmgr buckets in MMAPv1 Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Reducing contention on GroupCommit mutex (SERVER-15950) exposed a problem in flushing code.
The attached VTune lock profiler analysis shows that most of the actual lock wait contention (time-wise) is because of long flush times, as visible from the following callstacks:

Waiting Call Stack
1 of 7: 92.8% (1258.626s of 1355.788s)
 
mongod!boost::condition_variable::timed_wait+0xf4 - condition_variable.hpp:74
mongod!timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration,
1000ll> >+0x296 - condition_variable_fwd.hpp:72
mongod!mongo::CondVarLockGrantNotification::wait+0xa8 - lock_state.cpp:270
mongod!mongo::LockerImpl<(bool)1>::lock+0x210 - lock_state.cpp:454
mongod!mongo::LockerImpl<(bool)1>::endWriteUnitOfWork+0xfc - lock_state.cpp:407
mongod!mongo::WriteUnitOfWork::commit+0x3b - operation_context.h:176
mongod!singleInsert+0x1bf - batch_executor.cpp:1110
mongod!insertOne+0x1a0 - batch_executor.cpp:1038
mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1063
mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860
mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754
mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1160
mongod!mongo::Command::execCommand+0xc50 - dbcommands.cpp:1374
mongod!mongo::_runCommands+0x1ef - dbcommands.cpp:1450
mongod!runCommands+0x23 - new_find.cpp:131
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
mongod!receivedQuery+0x1f2 - instance.cpp:220
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 -
message_server_port.cpp:234
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]

Waiting Call Stack
2 of 7: 7.1% (95.964s of 1355.788s)
 
mongod!boost::condition_variable::timed_wait+0xf4 - condition_variable.hpp:74
mongod!timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration,
1000ll> >+0x296 - condition_variable_fwd.hpp:72
mongod!mongo::CondVarLockGrantNotification::wait+0xa8 - lock_state.cpp:270
mongod!mongo::LockerImpl<(bool)1>::lock+0x210 - lock_state.cpp:454
mongod!mongo::LockerImpl<(bool)1>::lockGlobal+0x1ad - lock_state.cpp:338
mongod!mongo::Lock::DBLock::lockDB+0x2c - d_concurrency.cpp:227
mongod!mongo::Lock::DBLock::DBLock+0xcd - d_concurrency.cpp:217
mongod!mongo::WriteBatchExecutor::ExecInsertsState::_lockAndCheckImpl+0x16d
- batch_executor.cpp:954
mongod!lockAndCheck+0x101 - batch_executor.cpp:1009
mongod!insertOne+0x8f - batch_executor.cpp:1037
mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1063
mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860
mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754
mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1160
mongod!mongo::Command::execCommand+0xc50 - dbcommands.cpp:1374
mongod!mongo::_runCommands+0x1ef - dbcommands.cpp:1450
mongod!runCommands+0x23 - new_find.cpp:131
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
mongod!receivedQuery+0x1f2 - instance.cpp:220
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 -
message_server_port.cpp:234
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]

This is not an actual lock contention, but just long flush times. kaloian.manassiev suggests this could be made better by flushing more often and some other small optimizations.



 Comments   
Comment by Githook User [ 21/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065, SERVER-5080 Write MMAP V1 journal in a separate thread

Offloads the writing of the journal to disk to a separate thread, so that
when the journal lock is dropped we can immediately loop back and pick up
the next batch of changes.

This allows the journal write to be parallelized with concurrent workload
once the intents list has been materialized.
Branch: master
https://github.com/mongodb/mongo/commit/f346c4ad0c2d892014c79d7adbbca61031f50194

Comment by Githook User [ 16/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Fix indentation in CommitJob

No functional changes.
Branch: master
https://github.com/mongodb/mongo/commit/1ff892ae9306bec0a6dbbee1eac7edd3791f1eec

Comment by Githook User [ 16/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Cleanup CommitJob and RecoveryJob so there is encapsulation

No functional changes.
Branch: master
https://github.com/mongodb/mongo/commit/ec08d1f900830ecf629dc34a2381ed1c599d02db

Comment by Githook User [ 12/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Move AlignedBuilder under MMAP V1
Branch: master
https://github.com/mongodb/mongo/commit/77065ddd70015350e20a876f6174ebd45148cce6

Comment by Githook User [ 12/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Move stuff out of CommitJob and into dur.cpp

No need to duplicate this functionality.
Branch: master
https://github.com/mongodb/mongo/commit/9566e29b1dd85ada54d5a5924d8f3f38c517a74f

Comment by Githook User [ 07/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 MMAP V1 remap always

We cannot remap selectively, because we now do not have a way to apply
back-pressure on the inserts coming in while journal I/O is happening
outside of the flush lock. That way if the journal I/O (WRITETOJOURNAL)
takes long time, we end up accumulating too much data in memory and
triggering rate limiting wassert.
Branch: master
https://github.com/mongodb/mongo/commit/2215b74a686290b97c90b27831f4d306b50f97da

Comment by Kaloian Manassiev [ 07/Jan/15 ]

Need to reopen and pull out of RC5, because we now do not have a way to apply back-pressure on the inserts coming in while journal I/O is happening outside of a lock. That way if the journal I/O (WRITETOJOURNAL) takes long time, we end up accumulating too much data in memory.

2015-01-07T21:18:37.901 0000 I -        [journal] warning assertion failure a <= 256*1024*1024 src/mongo/util/alignedbuilder.cpp 109
2015-01-07T21:18:37.907 0000 I CONTROL  [journal] 
----- BEGIN BACKTRACE -----
 dbtest(mongo::printStackTrace(std::ostream&) 0x29) [0x141a4e9]
 dbtest(mongo::logContext(char const*) 0xE1) [0x13bd3a1]
 dbtest(mongo::wasserted(char const*, char const*, unsigned int) 0x199) [0x13a31d9]
 dbtest(mongo::AlignedBuilder::growReallocate(unsigned int) 0x4D) [0x13a27ad]
 dbtest( 0xDCBC21) [0x11cbc21]
 dbtest(mongo::dur::PREPLOGBUFFER(mongo::dur::JSectHeader&, mongo::AlignedBuilder&) 0x204) [0x11cc094]
 dbtest( 0xDC3070) [0x11c3070]
 dbtest( 0x1066104) [0x1466104]
 libpthread.so.0( 0x8182) [0x7f0c61afe182]
 libc.so.6(clone 0x6D) [0x7f0c605c6efd]
-----  END BACKTRACE  -----
2015-01-07T21:18:38.019 0000 I -        [journal] rate limiting wassert

Comment by Kaloian Manassiev [ 06/Jan/15 ]

Marking this as needed CAP verification, because we need to verify that the flush pauses are going down now.

Comment by Githook User [ 06/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Drop flush lock early if not doing remap
Branch: master
https://github.com/mongodb/mongo/commit/143765cfe3fa006c7109e409abd4a14eb6b36c2e

Comment by Githook User [ 06/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Some formatting changes and comments
Branch: master
https://github.com/mongodb/mongo/commit/50c0c4491c856cf4217fd36057066d09281a96d8

Comment by Githook User [ 05/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Check for private/shared view match before remap
Branch: master
https://github.com/mongodb/mongo/commit/9bfa0cc0f41dc8d425263cbf841f62b3e9d484da

Comment by Githook User [ 05/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Move code around and add comments in dur.cpp

There are no functional changes, just adding comments and moving code
around to ease the change to early release the flush lock if we will
not be doing a remap.
Branch: master
https://github.com/mongodb/mongo/commit/f07504b53a284c7d4fa380c08b104c99a660a6bd

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Get rid of the groupCommit method
Branch: master
https://github.com/mongodb/mongo/commit/3bc56205c727f7a85da6a3c0321abaa782318a38

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16065 Cleanup the MMAP V1 durability code

No functional changes, just organizing declarations and cleaning up
includes.
Branch: master
https://github.com/mongodb/mongo/commit/0d93899c9031cfb776bad7e1b0f6ab9703fb2aea

Generated at Thu Feb 08 03:39:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.