[SERVER-38447] create index blocked by checkpoint mutex Created: 07/Dec/18  Updated: 07/Apr/23  Resolved: 01/Nov/21

Status: Closed
Project: Core Server
Component/s: Concurrency
Affects Version/s: 4.0.3
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: chensi Assignee: Backlog - Triage Team
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-37484 Only alter table logging settings on ... Closed
Assigned Teams:
Server Triage
Operating System: ALL
Participants:

 Description   

when create index with background on one database, the database x lock held for a long time, with the help of gdb, i found it was blocked by the wiredtiger's checkpoint mutex, the gdb stack is:

Thread 145 (Thread 0x7fab66ff4700 (LWP 1426590)):
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fab8e9cc479 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fab8e9cc2a0 in _GI__pthread_mutex_lock (mutex=0x55620d9a7538) at ../nptl/pthread_mutex_lock.c:79
#3 0x0000556208d4180b in __session_alter ()
#4 0x0000556208cdb5c5 in mongo::WiredTigerUtil::setTableLogging(_wt_session*, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bo
ol) ()
#5 0x0000556208cdbb58 in mongo::WiredTigerUtil::setTableLogging(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
const&, bool) ()
#6 0x0000556208ca6ea4 in mongo::WiredTigerIndex::WiredTigerIndex(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> const&, mongo::IndexDescriptor const*, mongo::KVPrefix, bool) ()
#7 0x0000556208ca7125 in mongo::WiredTigerIndexStandard::WiredTigerIndexStandard(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> > const&, mongo::IndexDescriptor const*, mongo::KVPrefix, bool) ()
#8 0x0000556208cae9a7 in mongo::WiredTigerKVEngine::getGroupedSortedDataInterface(mongo::OperationContext*, mongo::StringData, mongo::IndexDescriptor const*, mongo::KVPr
efix) ()
#9 0x0000556208e2ab4d in mongo::KVDatabaseCatalogEntry::getIndex(mongo::OperationContext*, mongo::CollectionCatalogEntry const*, mongo::IndexCatalogEntry*) ()
#10 0x000055620936b550 in mongo::IndexCatalogImpl::_setupInMemoryStructures(mongo::OperationContext*, std::unique_ptr<mongo::IndexDescriptor, std::default_delete<mongo::I
ndexDescriptor> >, bool) ()
#11 0x0000556209d21a9e in mongo::IndexCatalog::_setupInMemoryStructures(mongo::OperationContext*, std::unique_ptr<mongo::IndexDescriptor, std::default_delete<mongo::Index
Descriptor> >, bool) ()
#12 0x0000556209364029 in mongo::IndexCatalogImpl::IndexBuildBlock::init() ()
#13 0x0000556209374b2e in mongo::MultiIndexBlockImpl::init(std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> > const&) ()
#14 0x000055620925cb0f in auto mongo::writeConflictRetry<mongo::CmdCreateIndex::errmsgRun(mongo::OperationContext*, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&)::{lambda()#6}>(mongo::OperationContext*, mongo::StringData, mongo::CmdCreateIndex::errmsgRun(mongo::OperationContext*, std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&)::{lambda()#6}, mongo::CmdCreateIndex::errmsgRun(mongo::OperationContext*, std::cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, std::_cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&)::{lambda()#6}&&) ()



 Comments   
Comment by Kelsey Schubert [ 01/Nov/21 ]

Hi chensi_04@126.com,

I believe this issue has been resolved by SERVER-37484. Would you please consider upgrading and let us know if the issue is resolved?

Thank you,
Kelsey

Comment by deyukong [ 27/Dec/18 ]

@daniel.hatcher 

After a short-time conversation with Jack, we believe the problem has already been described with the stack in @chensi_04@126.com's second post. Mean no offence but no more logs are needed for this problem.

We can discuss about it with the latest mongo code base.

  • git rev-parse HEAD
    9ff8d7452ddf7370fc9047ebe9517fd7558914e6

1)in commands/create_indexes.cpp, line 311, target db's X lock is held.

2)And in commmands/create_indexes.cpp, line 383, indexer.init is called. If we track the call path, we can find  WiredTigerUtil::setTableLogging is called,

track deeper, we can find it gets into wiredtiger, session_api.c, line 670, the checkpoint mutex is taken.

So, in conclusion, the db's X lock may block for quite a long time if a checkpoint happens to be doing. It may block for several seconds or tens of seconds if

the mongod instance is suffering a heavy workload.

It may or may not be a bug, but it is caused by the checkpoint mutex.

And, what's important is that it's unacceptable for a background index building to block the whole db for quite a long time, because, it is a newly introduced

problem. for the compatibilty with old versions, it should be treated as a bug.

 

Comment by Danny Hatcher (Inactive) [ 10/Dec/18 ]

Hello Jack,

Unfortunately, it will be very difficult for us to validate this issue unless we have the logs. I have generated a Secure Upload Portal that you can use to upload files. Only MongoDB engineers can access the portal and files within are deleted after 180 days. Would this satisfy your security concerns?

Thank you,

Danny

Comment by chensi [ 08/Dec/18 ]

1. I send the create index command, but
“build index on xxx.yyy” log was printed after twenty or thirty seconds. Before it printed, all requests on this database were blocked, because I saw many slow logs immediately after the log, all the slow logs indicate they are waiting the database lock.
Also I use poor man’s profiler found the gdb stack as above.

2. On the primary
3. I mentioned in 1, but logs have some security, sorry about that
4. Later I will upload
发自我的 iPhone

Comment by Danny Hatcher (Inactive) [ 07/Dec/18 ]

Hello Jack,

I cannot validate what will happen if you manually change any code and we cannot troubleshoot any such changes.

Can you please provide more context on the issue?
1. How did you determine that an inappropriate lock was held for a large amount of time?
2. Did this happen on the Primary or on a Secondary?
3. Please provide the mongod log from the server.
4. Please provide the "diagnostic.data" folder located under the $dbpath.

Thank you,

Danny

Comment by chensi [ 07/Dec/18 ]

also,  we need to fix this problem as soon as possible, is it ok if i just remove the setTableLogging from the following code:
WiredTigerIndex::WiredTigerIndex(OperationContext* ctx,
const std::string& uri,
const IndexDescriptor* desc,
KVPrefix prefix,
bool isReadOnly)
: _ordering(Ordering::make(desc->keyPattern())),
_uri(uri),
_tableId(WiredTigerSession::genTableId()),
_collectionNamespace(desc->parentNS()),
_indexName(desc->indexName()),
_prefix(prefix),
_isIdIndex(desc->isIdIndex()) {
auto version =WiredTigerUtil::checkApplicationMetadataFormatVersion(
ctx, uri, kMinimumIndexVersion, kMaximumIndexVersion);
if (!version.isOK()) {
Status versionStatus = version.getStatus();
Status indexVersionStatus(
ErrorCodes::UnsupportedFormat,
str::stream() << versionStatus.reason() <<" Index:

{name: "<< desc->indexName() <<", ns: " << desc->parentNS() <<"}

- version too new for this mongod."
<<" See http://dochub.mongodb.org/core/3.4-index-downgrade for detailed"
<<" instructions on how to handle this error.");
fassertFailedWithStatusNoTrace(28579, indexVersionStatus);
}
_dataFormatVersion = version.getValue();

// Index data format 6 and 9 correspond to KeyString version V0 and data format 8 and 10
// correspond to KeyString version V1
_keyStringVersion = (_dataFormatVersion == kDataFormatV2KeyStringV1IndexVersionV2 ||
_dataFormatVersion == kDataFormatV4KeyStringV1UniqueIndexVersionV2)
? KeyString::Version::V1
: KeyString::Version::V0;

if (!isReadOnly) {
bool replicatedWrites =getGlobalReplSettings().usingReplSets() ||
repl::ReplSettings::shouldRecoverFromOplogAsStandalone();
uassertStatusOK(WiredTigerUtil::setTableLogging(
ctx,
uri,
WiredTigerUtil::useTableLogging(NamespaceString(desc->parentNS()), replicatedWrites)));
}
}
 
thx

Comment by chensi [ 07/Dec/18 ]

and, i see this: https://jira.mongodb.org/browse/SERVER-35625

setTableLogging will be removed, but please tell the roadmap, thx.

Generated at Thu Feb 08 04:48:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.