[SERVER-16502] CollectionInfoCache being written without exclusive lock Created: 11/Dec/14  Updated: 15/Jan/15  Resolved: 02/Jan/15

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.8.0-rc2
Fix Version/s: 2.8.0-rc5

Type: Bug Priority: Major - P3
Reporter: Cailin Nelson Assignee: Scott Hernandez (Inactive)
Resolution: Done Votes: 0
Labels: wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-16711 Do not create empty database while ch... Closed
related to SERVER-16848 Shell's db.collection.totalSize() ret... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Happy to provide the full log, but doubt it is helpful in this case.

2014-12-11T09:59:48.422+0000 I QUERY    [conn2464] command local.$cmd command: collStats { collstats: "oplog.rs" } keyUpdates:0  reslen:3264 1600ms
2014-12-11T09:59:49.054+0000 I WRITE    [conn2355] update mmsdbhours.data.rrdHours query: { _id: "06a98548784bf2292cbbc9168c237c34-56836570e762631af71d0013f17db191-20141211-getmore-opcounters" } update: { $inc: { hy.9.n: 1, hy.9.t: 2315888 } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 fastmod:1 keyUpdates:0  352ms
2014-12-11T10:00:04.418+0000 F -        [conn2353] Invalid access at address: 0
2014-12-11T10:00:04.454+0000 F -        [conn2353] Got signal: 11 (Segmentation fault).
 
 0xf0cfa9 0xf0c912 0xf0cc3e 0x7fe3b11cd340 0x7fe3b0b21e20 0x810584 0xd4701d 0x8f2825 0xb02739 0xb01254 0xb01ee0 0xa8b4d9 0xa8f39e 0x7e02d0 0xecb271 0x7fe3b11c5182 0x7fe3b02c5efd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B0CFA9"},{"b":"400000","o":"B0C912"},{"b":"400000","o":"B0CC3E"},{"b":"7FE3B11BD000","o":"10340"},{"b":"7FE3B0AAD000","o":"74E20"},{"b":"400000","o":"410584"},{"b":"400000","o":"94701D"},{"b":"400000","o":"4F2825"},{"b":"400000","o":"702739"},{"b":"400000","o":"701254"},{"b":"400000","o":"701EE0"},{"b":"400000","o":"68B4D9"},{"b":"400000","o":"68F39E"},{"b":"400000","o":"3E02D0"},{"b":"400000","o":"ACB271"},{"b":"7FE3B11BD000","o":"8182"},{"b":"7FE3B01CB000","o":"FAEFD"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc2", "gitVersion" : "f77869e01a6ed235695a46fad441c0dff15202f6", "uname" : { "sysname" : "Linux", "release" : "3.13.0-36-generic", "version" : "#63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF682FE000", "elfType" : 3 }, { "b" : "7FE3B11BD000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7FE3B0FB5000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7FE3B0DB1000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7FE3B0AAD000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7FE3B07A7000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7FE3B0591000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7FE3B01CB000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7FE3B13DB000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf0cfa9]
 mongod(+0xB0C912) [0xf0c912]
 mongod(+0xB0CC3E) [0xf0cc3e]
 libpthread.so.0(+0x10340) [0x7fe3b11cd340]
 libstdc++.so.6(_ZSt29_Rb_tree_insert_and_rebalancebPSt18_Rb_tree_node_baseS0_RS_+0xA0) [0x7fe3b0b21e20]
 mongod(_ZNSt8_Rb_treeISsSsSt9_IdentityISsESt4lessISsESaISsEE16_M_insert_uniqueISsEESt4pairISt17_Rb_tree_iteratorISsEbEOT_+0x1A4) [0x810584]
 mongod(_ZN5mongo15UpdateIndexData7addPathERKNS_10StringDataE+0xFD) [0xd4701d]
 mongod(_ZN5mongo19CollectionInfoCache16computeIndexKeysEPNS_16OperationContextE+0x115) [0x8f2825]
 mongod(_ZNK5mongo19UpdateLifecycleImpl12getIndexKeysEPNS_16OperationContextE+0x29) [0xb02739]
 mongod(_ZN5mongo14UpdateExecutor13prepareInLockEPNS_8DatabaseE+0x174) [0xb01254]
 mongod(_ZN5mongo14UpdateExecutor7executeEPNS_8DatabaseE+0x100) [0xb01ee0]
 mongod(_ZN5mongo14receivedUpdateEPNS_16OperationContextERNS_7MessageERNS_5CurOpE+0x589) [0xa8b4d9]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0x16EE) [0xa8f39e]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e02d0]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x411) [0xecb271]
 libpthread.so.0(+0x8182) [0x7fe3b11c5182]
 libc.so.6(clone+0x6D) [0x7fe3b02c5efd]
-----  END BACKTRACE  -----



 Comments   
Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-16502: better logging for kv_catalog collection metadata
Branch: master
https://github.com/mongodb/mongo/commit/bd22baf9adae1bda00e6b18c012c5db763aaee72

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-16502: make getCollection const and not require an OpContext
Branch: master
https://github.com/mongodb/mongo/commit/8384c878b542352ba88ecade1e675b8412f2655f

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-16502: open database after repair is done
Branch: master
https://github.com/mongodb/mongo/commit/8b37507dd51cdf058377a24ca0171e7fae6f2c6b

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-16502: create CollectionInfoCache on Collection construction
Branch: master
https://github.com/mongodb/mongo/commit/430bafbd8643bd1d30513ad231850c6927e8553d

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-16502: do not lazy init mmapv1 db catalog
Branch: master
https://github.com/mongodb/mongo/commit/8bedd077eed61a6f39a943a23df1db490974a359

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-16502: do not create db on drop
Branch: master
https://github.com/mongodb/mongo/commit/33baf3ab68eff7679ec39a75cfcccf9e8d882ca5

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-16502: don't init Collections lazily
Branch: master
https://github.com/mongodb/mongo/commit/f3557df19cde6d14b5af74621dc6d949825c695a

Comment by Githook User [ 02/Jan/15 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-16502: only check locking if ns is full
Branch: master
https://github.com/mongodb/mongo/commit/7791afd96781d9ebac28969e582e64699e66c147

Comment by J Rassi [ 11/Dec/14 ]

Just spoke to milkie about this.

My theory is that CollectionInfoCache::_indexedPaths is protected by the collection exclusive lock in MMAPv1, but it is being concurrently modified under a collection intent exclusive lock in WiredTiger. This member variable exists in order to help the update subsystem decide whether a given update is a "bit-blip update", in which case index key generation is skipped.

I believe the core of the issue is that CollectionInfoCache::_indexedPaths is lazily initialized, and its accessor CollectionInfoCache::indexKeys() (which is called from the update subsystem) can write to this member variable under an intent lock if initialization hasn't happened yet. Assuming this is the case, a sufficient fix for this issue would be to remove the lazy initialization and make CollectionInfoCache::indexKeys() a const method. This would involve getting rid of CollectionInfoCache::_keysComputed and verifying that CollectionInfoCache::CollectionInfoCache() and CollectionInfoCache::addedIndex() are both always called under an exclusive lock. If all goes well, then CollectionInfoCache::_indexKeys would be written to only when the collection is opened or an index build completes (both of which should hold an exclusive lock), and would only be read from during an update (which should hold an intent lock).

Assigning to Eric.

Comment by Matt Kangas [ 11/Dec/14 ]

addr2line output on the debug artifacts

std::pair<std::_Rb_tree_iterator<std::string>, bool> std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_insert_unique<std::string>(std::string&&) at ??:0
~basic_string at /opt/mongodbtoolchain/include/c++/4.8.2/bits/basic_string.h:293
 (inlined by) mongo::UpdateIndexData::addPath(mongo::StringData const&) at /srv/10gen/mci-exec/mci/shell/src/src/mongo/db/update_index_data.cpp:45
mongo::CollectionInfoCache::computeIndexKeys(mongo::OperationContext*) at /srv/10gen/mci-exec/mci/shell/src/src/mongo/db/catalog/collection_info_cache.cpp:74
mongo::CollectionInfoCache::indexKeys(mongo::OperationContext*) at /srv/10gen/mci-exec/mci/shell/src/src/mongo/db/catalog/collection_info_cache.h:78
 (inlined by) mongo::UpdateLifecycleImpl::getIndexKeys(mongo::OperationContext*) const at /srv/10gen/mci-exec/mci/shell/src/src/mongo/db/ops/update_lifecycle_impl.cpp:66
mongo::UpdateExecutor::prepareInLock(mongo::Database*) at /srv/10gen/mci-exec/mci/shell/src/src/mongo/db/ops/update_executor.cpp:173
mongo::UpdateExecutor::execute(mongo::Database*) at /srv/10gen/mci-exec/mci/shell/src/src/mongo/base/status-inl.h:71
intrusive_ptr at /srv/10gen/mci-exec/mci/shell/src/src/third_party/boost/boost/smart_ptr/intrusive_ptr.hpp:89
 (inlined by) SharedBuffer at /srv/10gen/mci-exec/mci/shell/src/src/mongo/util/shared_buffer.h:37
 (inlined by) BSONObj at /srv/10gen/mci-exec/mci/shell/src/src/mongo/bson/bsonobj.h:129
 (inlined by) mongo::receivedUpdate(mongo::OperationContext*, mongo::Message&, mongo::CurOp&) at /srv/10gen/mci-exec/mci/shell/src/src/mongo/db/instance.cpp:591
mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool) at /srv/10gen/mci-exec/mci/shell/src/src/mongo/db/instance.cpp:451
~basic_string at /opt/mongodbtoolchain/include/c++/4.8.2/bits/basic_string.h:539
 (inlined by) ~HostAndPort at /srv/10gen/mci-exec/mci/shell/src/src/mongo/util/net/hostandport.h:48
 (inlined by) mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) at /srv/10gen/mci-exec/mci/shell/src/src/mongo/db/db.cpp:194
mongo::PortMessageServer::handleIncomingMsg(void*) at ??:0

Generated at Thu Feb 08 03:41:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.