[SERVER-26018] Inconsistency between the LockManager grantedList and grantedCount Created: 08/Sep/16  Updated: 10/May/23  Resolved: 11/Oct/16

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 3.0.12, 3.2.10, 3.4.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: xiaost Assignee: Kaloian Manassiev
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File LockMgrInvariants.diff     File currentOp.out     File gdb.withsymbols.out     Text File server_status.txt     Text File stacks.txt    
Issue Links:
Related
is related to SERVER-26578 Add startup warning for Intel CPUs wh... Closed
Operating System: ALL
Steps To Reproduce:

unpredictable, seems to be produce on secondary
(not sure, 3 times on secondary replica in two week)

Sprint: Sharding 2016-09-19, Sharding 2016-10-10, Sharding 2016-10-31
Participants:
Linked BF Score: 5

 Description   

All threads hang on waiting locks. The state of the LockHead indicates that there is inconsistency between the grantedList and grantedCount where there are not granted requests, but the granted counts are non-zero:

(gdb) p $18
$38 = {
  resourceId = {
    _fullHash = 2305843009213693953
  },
  grantedList = {
    _front = 0x0,
    _back = 0x0
  },
  grantedCounts = {0, 1, 0, 0, 0},
  grantedModes = 2,
  conflictList = {
    _front = 0x7f08028,
    _back = 0x5902cf628
  },
  conflictCounts = {0, 1490, 0, 0, 1},
  conflictModes = 18,
  partitions = {
    <std::_Vector_base<mongo::LockManager::Partition*, std::allocator<mongo::LockManager::Partition*> >> = {
      _M_impl = {
        <std::allocator<mongo::LockManager::Partition*>> = {
          <__gnu_cxx::new_allocator<mongo::LockManager::Partition*>> = {<No data fields>}, <No data fields>},
        members of std::_Vector_base<mongo::LockManager::Partition*, std::allocator<mongo::LockManager::Partition*> >::_Vector_impl:
        _M_start = 0x7470640,
        _M_finish = 0x7470640,
        _M_end_of_storage = 0x7470680
      }
    }, <No data fields>},
  conversionsCount = 0,
  compatibleFirstCount = 0
}

Attachment shows stacks of all threads and the output of db.currentOp().

Here is our cluster info:

  • 3 shards * 3 replica
  • using both range & hash based sharding
  • collection size from 50GB to 500GB


 Comments   
Comment by xiaost [ 24/Jan/17 ]

FYI:

here is the changelog of libc in debian 8.7 which released on 20170114:

disable lock elision (aka Intel TSX)
on x86 architectures. This causes programs (wrongly) unlocking an already
unlocked mutex to abort. More importantly most of the other distributions
decided to disable it, so we don't want to be the only distribution left
testing this code path.

Comment by xiaost [ 12/Oct/16 ]

hi, geert.bosch, it is identical except microcode version. here is the cpuinfo:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb00001d
cpu MHz         : 1745.734
cache size      : 25600 KB
physical id     : 0
siblings        : 20
core id         : 0
cpu cores       : 10
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap
bogomips        : 4400.01
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Comment by Geert Bosch [ 12/Oct/16 ]

Hi xiaost, I've tried to see what the specific CPU/microcode versions are that might expose the issue. In your comment on Sep 22 2016, you listed the /proc/cpuinfo output for the old and the new processor versions. Could you give us the /proc/cpuinfo for the new processor after you applied the microcode update? I'm particularly curious if the "rtm" flag is still set, and what the new version of the new microcode is.

Thanks again!

Comment by Kaloian Manassiev [ 11/Oct/16 ]

Thank you for all the help, xiaost. We have filed SERVER-26578 as a result of your investigation and will try to discover this problem proactively and issue a startup warning.

Comment by xiaost [ 11/Oct/16 ]

Hi kaloian.manassiev,

after we spent a lot of time on debugging multi-threaded issue on mongod, we found out:

  • can only be reproduced on servers with the new CPU(E5-2630 v4)
  • can be easily reproduced by modification of unittests
  • can only be reproduced under particular code execution sequence
  • it works well if we add some debug codes into the lock context

after debugging, we started to focus on hardware issue, including memory / CPU.

With the help of Google, we found the TSX feature, speeding up execution of multi-threaded software through lock elision, seems to be evil of everything since 2014:
[1 [2 [3

In August 2014, Intel announced a bug in the TSX implementation on current steppings of Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX feature on affected CPUs via a microcode update.

we checkout our microcode changelog. In the latest release:

+ Likely fixes a recently identified, critical but low-hitting TSX
erratum on Broadwell, Broadwell-E and related Xeons
(Broadwell-DE/WS/EP: Xeon-D 1500, E3-v4 and E5-v4)

so we try it, and work it out.




thank you for your help :-D

Best regards,
-Xiaost

Comment by Kaloian Manassiev [ 11/Oct/16 ]

Hi xiaost,

Thank you for providing us with this information. CPU problem and memory corruption was something we did consider, but weren't sure how to diagnose and also discounted as highly improbable.

May I ask how did you come to the conclusion that it is a microcode issue and decide to upgrade? This will greatly help us when trying to diagnose any future reports of this kind which we might get.

Best regards,
-Kal.

Comment by xiaost [ 11/Oct/16 ]

hello, all.

seems to be Intel CPU issue (TSX), after we upgrading microcode of CPUs, the issue gone.

(E5-2630 v4 with microcode 0xb000010 => 0xb00001d )

Comment by xihui he [ 08/Oct/16 ]

Hi kaloian.manassiev,

FYI: when slaves are on a high load, we can easily reproduce the issue on our new slave machines with 3.2.8, 3.2.9 debian build, 3.2.9 linux build and 3.2.9 built on our new machine by ourself.

Regards,
Xihui

Comment by Kaloian Manassiev [ 28/Sep/16 ]

Hi bydsky,

Thank you very much for the reproduction. I will take it from here and try figure out what happened from the call stack.

By debug file - do you mean the symbols or a debug variant of the build? Symbols are here. In either case I can resolve the call stack myself from the backtrace you pasted and I will post it in the ticket.

Thanks again for the help.

Best regards,
-Kal.

Comment by xihui he [ 28/Sep/16 ]

Could you please provide the debug file for the build?

Many Thanks,
Xihui

Comment by xihui he [ 28/Sep/16 ]

Attached is serverStatus.

Comment by xihui he [ 28/Sep/16 ]

Below is the bt in the log:

2016-09-28T20:04:22.102+0800 I NETWORK  [conn1258] end connection 10.9.16.136:37183 (176 connections now open)
2016-09-28T20:04:24.540+0800 I NETWORK  [conn1281] end connection 10.2.96.135:40341 (175 connections now open)
2016-09-28T20:04:24.545+0800 I NETWORK  [conn1213] end connection 10.9.16.74:19928 (174 connections now open)
2016-09-28T20:04:24.557+0800 I NETWORK  [conn41] end connection 10.9.16.74:11811 (173 connections now open)
2016-09-28T20:04:25.151+0800 I NETWORK  [conn974] end connection 10.9.16.136:35871 (172 connections now open)
2016-09-28T20:04:25.279+0800 I NETWORK  [conn1040] end connection 10.9.16.74:18951 (171 connections now open)
2016-09-28T20:04:25.691+0800 I NETWORK  [initandlisten] connection accepted from 10.2.96.135:40372 #1284 (172 connections now open)
2016-09-28T20:04:26.668+0800 I -        [conn1273] Invariant failure (lock->grantedModes == 0) ^ (lock->grantedList._front != NULL) src/mongo/db/concurrency/lock_manager.cpp 795
2016-09-28T20:04:26.668+0800 I -        [conn1273]
 
***aborting after invariant() failure
 
 
2016-09-28T20:04:26.709+0800 I NETWORK  [initandlisten] connection accepted from 10.9.16.74:20360 #1285 (173 connections now open)
2016-09-28T20:04:26.716+0800 F -        [conn1273] Got signal: 6 (Aborted).
 
 0x1395a22 0x1394b59 0x1395362 0x7faf77ca48d0 0x7faf7791f067 0x7faf77920448 0x131b72b 0xbc93c0 0xbc97ee 0xbd0bb6 0xbd44cc 0xbd536e 0xec3eeb 0xea5ea7 0xea1012 0xea15e9 0xb5ea8d 0xbdf653 0xbe04e4 0xb3b5e0 0xcee2d5 0x9cdddc 0x13420c5 0x7faf77c9d0a4 0x7faf779d287d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F95A22","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F94B59"},{"b":"400000","o":"F95362"},{"b":"7FAF77C95000","o":"F8D0"},{"b":"7FAF778EA000","o":"35067","s":"gsignal"},{"b":"7FAF778EA000","o":"36448","s":"abort"},{"b":"400000","o":"F1B72B","s":"_ZN5mongo15invariantFailedEPKcS1_j"},{"b":"400000","o":"7C93C0","s":"_ZN5mongo11LockManager18_onLockModeChangedEPNS_8LockHeadEb"},{"b":"400000","o":"7C97EE","s":"_ZN5mongo11LockManager6unlockEPNS_11LockRequestE"},{"b":"400000","o":"7D0BB6","s":"_ZN5mongo10LockerImplILb0EE11_unlockImplEPNS_14FastMapNoAllocINS_10ResourceIdENS_11LockRequestELi16EE12IteratorImplIS5_S4_EE"},{"b":"400000","o":"7D44CC","s":"_ZN5mongo10LockerImplILb0EE6unlockENS_10ResourceIdE"},{"b":"400000","o":"7D536E","s":"_ZN5mongo10LockerImplILb0EE22saveLockStateAndUnlockEPNS_6Locker12LockSnapshotE"},{"b":"400000","o":"AC3EEB","s":"_ZN5mongo10QueryYield13yieldAllLocksEPNS_16OperationContextEPNS_13RecordFetcherERKSs"},{"b":"400000","o":"AA5EA7","s":"_ZN5mongo15PlanYieldPolicy5yieldEPNS_13RecordFetcherE"},{"b":"400000","o":"AA1012","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"400000","o":"AA15E9","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"400000","o":"75EA8D","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE"},{"b":"400000","o":"7DF653","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"400000","o":"7E04E4","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"400000","o":"73B5E0","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"400000","o":"8EE2D5","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"5CDDDC","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"400000","o":"F420C5","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7FAF77C95000","o":"80A4"},{"b":"7FAF778EA000","o":"E887D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.9-patch-57e566ff3ff12239ba009729", "gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c", "compiledModules" : [ "enterprise" ], "uname" : { "sysname" : "Linux", "release" : "3.16.0-4-amd64", "version" : "#1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02)", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "8317357055AE57444E562FE8AC2BDDAE3D3C4A45" }, { "b" : "7FFDA7BEB000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "72435CB4CCBBDF0955F6318805B6758E93B67E56" }, { "b" : "7FAF7A085000", "path" : "/usr/lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "D570D2B1AB4231175CC17AF644A037C238BC1CDF" }, { "b" : "7FAF79E3A000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "524426F486D3FB8A7123B1421D434B8C82DFEE2A" }, { "b" : "7FAF799AC000", "path" : "/usr/lib/x86_64-linux-gnu/libnetsnmpmibs.so.30", "elfType" : 3, "buildId" : "2D98E54467771C4F88FF0F159EE5B93A12320983" }, { "b" : "7FAF7979D000", "path" : "/usr/lib/x86_64-linux-gnu/libsensors.so.4", "elfType" : 3, "buildId" : "1C57F13C2769F19345ED341AD69755535B7804BB" }, { "b" : "7FAF79590000", "path" : "/lib/x86_64-linux-gnu/libpci.so.3", "elfType" : 3, "buildId" : "F429E13731E81B5D31ABDDBBC0E193A7BE6EC205" }, { "b" : "7FAF7938C000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "1FDE060F62834E445210B73CD25C7ABDF905B542" }, { "b" : "7FAF7911D000", "path" : "/usr/lib/x86_64-linux-gnu/libnetsnmpagent.so.30", "elfType" : 3, "buildId" : "9F5DB6C0F1923F6E8E9BA0C5720BE48CC256C51C" }, { "b" : "7FAF78F12000", "path" : "/lib/x86_64-linux-gnu/libwrap.so.0", "elfType" : 3, "buildId" : "B57BEBDE43752BFFF3A132B6799A89B7C05FB889" }, { "b" : "7FAF78C2E000", "path" : "/usr/lib/x86_64-linux-gnu/libnetsnmp.so.30", "elfType" : 3, "buildId" : "F595AFA76A33688659A3519BD1B49C65FA38CEE7" }, { "b" : "7FAF78832000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "C72426F83BD6D6E49DF4B366AA39FAD5837C8C1F" }, { "b" : "7FAF78531000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "53024C3BBDBA1F0CBCC01D6A80FA2CBDFE33EB42" }, { "b" : "7FAF782D0000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "C72A89930DB5C11FE5737AA6D0CA245AA4C08329" }, { "b" : "7FAF780C8000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "C96926159E07B6128EFEC30001EF8AD3D24672DD" }, { "b" : "7FAF77EB2000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "D5FB04F64B3DAEA6D6B68B5E8B9D4D2BC1A6E1FC" }, { "b" : "7FAF77C95000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "424D19435BCBB4D6E67DC19FD6EA95AC08B09237" }, { "b" : "7FAF778EA000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "2BBA9D358BAC63FF81597767B5BC229316BD12E0" }, { "b" : "7FAF7A2A1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "486275655F9C95F0B45F9E4CEA1A448DB59625D8" }, { "b" : "7FAF776D3000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "C2407B0CA9EF3EEC8692A1D134FA85FBEA730904" }, { "b" : "7FAF773FF000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.3", "elfType" : 3, "buildId" : "F492C814744A9B84D357EC02C6E60B54DC1FE5A1" }, { "b" : "7FAF771CE000", "path" : "/usr/lib/x86_64-linux-gnu/libk5crypto.so.3", "elfType" : 3, "buildId" : "4CE1E54AD57410AABFA705DD76C36607763905CB" }, { "b" : "7FAF76FCA000", "path" : "/lib/x86_64-linux-gnu/libcom_err.so.2", "elfType" : 3, "buildId" : "B353A08508AEE61771A5983D26896C6755E2DDCE" }, { "b" : "7FAF76DBE000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5support.so.0", "elfType" : 3, "buildId" : "0BC0D2DA06EDAC5DF30557063AC8E00115A96F73" }, { "b" : "7FAF76BBA000", "path" : "/lib/x86_64-linux-gnu/libkeyutils.so.1", "elfType" : 3, "buildId" : "A66DF74417D9788837F8EC61DC783A3FB6FF99BE" }, { "b" : "7FAF767F9000", "path" : "/usr/lib/x86_64-linux-gnu/libperl.so.5.20", "elfType" : 3, "buildId" : "AD6B5C8927381B5D0CE7174CF4F6885612359533" }, { "b" : "7FAF765C2000", "path" : "/lib/x86_64-linux-gnu/libcrypt.so.1", "elfType" : 3, "buildId" : "9C615C05AAEF68F1464FFBC04275057739EE4EB2" }, { "b" : "7FAF763A7000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "ADCC4A5E27D5DE8F0BC3C6021B50BA2C35EC9A8E" }, { "b" : "7FAF7618F000", "path" : "/lib/x86_64-linux-gnu/libnsl.so.1", "elfType" : 3, "buildId" : "835A5454D995A3980E789FF10708FAC633780EFC" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1395a22]
 mongod(+0xF94B59) [0x1394b59]
 mongod(+0xF95362) [0x1395362]
 libpthread.so.0(+0xF8D0) [0x7faf77ca48d0]
 libc.so.6(gsignal+0x37) [0x7faf7791f067]
 libc.so.6(abort+0x148) [0x7faf77920448]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xCB) [0x131b72b]
 mongod(_ZN5mongo11LockManager18_onLockModeChangedEPNS_8LockHeadEb+0x3C0) [0xbc93c0]
 mongod(_ZN5mongo11LockManager6unlockEPNS_11LockRequestE+0x1AE) [0xbc97ee]
 mongod(_ZN5mongo10LockerImplILb0EE11_unlockImplEPNS_14FastMapNoAllocINS_10ResourceIdENS_11LockRequestELi16EE12IteratorImplIS5_S4_EE+0x36) [0xbd0bb6]
 mongod(_ZN5mongo10LockerImplILb0EE6unlockENS_10ResourceIdE+0x10C) [0xbd44cc]
 mongod(_ZN5mongo10LockerImplILb0EE22saveLockStateAndUnlockEPNS_6Locker12LockSnapshotE+0xAE) [0xbd536e]
 mongod(_ZN5mongo10QueryYield13yieldAllLocksEPNS_16OperationContextEPNS_13RecordFetcherERKSs+0x5B) [0xec3eeb]
 mongod(_ZN5mongo15PlanYieldPolicy5yieldEPNS_13RecordFetcherE+0x97) [0xea5ea7]
 mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x362) [0xea1012]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x39) [0xea15e9]
 mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE+0x92D) [0xb5ea8d]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x473) [0xbdf653]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x404) [0xbe04e4]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x1F0) [0xb3b5e0]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB45) [0xcee2d5]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x9cdddc]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x325) [0x13420c5]
 libpthread.so.0(+0x80A4) [0x7faf77c9d0a4]
 libc.so.6(clone+0x6D) [0x7faf779d287d]
-----  END BACKTRACE  -----

Comment by Githook User [ 27/Sep/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-26018 Add more invariants in LockManager

Also adds concurrency stress tests to exercise the lock head migrations
upon taking locks in strong mode.
Branch: master
https://github.com/mongodb/mongo/commit/dc50fa55c5efd8617bef4932e2fbdf40f5a5b587

Comment by xihui he [ 27/Sep/16 ]

I have replaced our production slaves with the binary you provided. Will let you know if it happens again.

Comment by Kaloian Manassiev [ 26/Sep/16 ]

EDIT: I was notified that the link I had attached requires password, so instead I have attached the diff in a file called LockMgrInvariants.diff.

Certainly, sorry about that - I should have included them originally. Please find the code changes along with this run through our continuous integration system: https://evergreen.mongodb.com/version/57e566ff3ff12239ba009729

Please note that the change is not yet committed to the official branch (and other releases) because we have not found the underlying cause of the bug and we also have not fully tested the possible performance impact of including the extra invariants. In particular this change:

-    dassert((lock->grantedModes == 0) ^ (lock->grantedList._front != NULL));
-    dassert((lock->conflictModes == 0) ^ (lock->conflictList._front != NULL));
+    invariant((lock->grantedModes == 0) ^ (lock->grantedList._front != NULL));
+    invariant((lock->conflictModes == 0) ^ (lock->conflictList._front != NULL));

Comment by xihui he [ 26/Sep/16 ]

Hi kaloian.manassiev,

Could you please add the corresponding commit if possible?

Thanks,
Xihui

Comment by Kaloian Manassiev [ 23/Sep/16 ]

Hi xiaost and bydsky,

The build with the additional assertions is available here. It is built on top of the 3.2.9 release.

Thank you again for your help.

Best regards,
-Kal.

Comment by Kaloian Manassiev [ 22/Sep/16 ]

Thank you for providing me with the machine information. We are now testing the binary with the extended checks and I will provide you with download links when it is ready.

Many thanks again for all your help with diagnosing this problem.

Best regards,
-Kal.

Comment by xihui he [ 22/Sep/16 ]

Another difference is till now the issue only happens on our new machine (we upgraded to 3.2.9 meanwhile we add new machines to the cluster).
new machine:

debian 8.5
cpu:
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
stepping	: 1
microcode	: 0xb000010
cpu MHz		: 1891.742
cache size	: 25600 KB
physical id	: 1
siblings	: 20
core id		: 12
cpu cores	: 10
apicid		: 57
initial apicid	: 57
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap
bogomips	: 4401.44
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

old machine

debian 8.4
cpu
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
stepping	: 2
microcode	: 0x36
cpu MHz		: 2600.531
cache size	: 20480 KB
physical id	: 1
siblings	: 16
core id		: 7
cpu cores	: 8
apicid		: 31
initial apicid	: 31
fpu		: yes
fpu_exception	: yes
cpuid level	: 15
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips	: 4800.68
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Comment by xihui he [ 22/Sep/16 ]

Hi kaloian.manassiev,

1. The issue happened on different hosts, however it happened after we upgrade to 3.2.9, and till now it only happened on slaves.
2. We restarted the server, so if it happens again, we'll dump db.serverStatus()
3. It's OK to us to run a patch build on slaves.

Thanks,
Xihui

Comment by Kaloian Manassiev [ 21/Sep/16 ]

Hi xiaost and bydsky,

Thank you very much for the detailed reports and investigation, and apologies for the delayed response.

The information that you provided is really helpful, but also extremely puzzling. After inspecting the code, I was unable to find a case where there are no entries on the grantedList, but the grantedCounts/grantedModes bitmask is not zero and I have a couple of questions:

  • Is it always the same machine which hits this problem or it happens on different ones?
  • Would it be possible to run the db.serverStatus() command against one of the servers where it has happened and include the output?

In order to help investigate this more I am working on a change which adds more invariant checks to the lock manager and also rearranging the code a little to make it easier to follow. Would you be willing to take a 3.2.9-based debian8.1 patch build with these changes and run it on one such machine in order to see whether any of the invariants get hit?

Thank you again for your help investigating this issue.

Best regards,
-Kal.

Comment by xihui he [ 21/Sep/16 ]

Seems grantedCounts is not clear on unlock again.

(gdb) p  (globalLockManager._lockBuckets)[2].data._M_h._M_buckets[4]->_M_nxt
$25 = (std::__detail::_Hash_node_base *) 0x9fd7ef00
(gdb) p *((*(std::__detail::_Hash_node<std::pair<mongo::ResourceId const, mongo::LockHead*>, true>*)0x9fd7ef00)._M_v.second)
$26 = {
  resourceId = {
    _fullHash = 2305843009213693954
  },
  grantedList = {
    _front = 0x0,
    _back = 0x0
  },
  grantedCounts = {0, 2, 0, 0, 0},
  grantedModes = 2,
  conflictList = {
    _front = 0x29e168128,
    _back = 0x29e168128
  },
  conflictCounts = {0, 0, 0, 0, 1},
  conflictModes = 16,
  partitions = {
    <std::_Vector_base<mongo::LockManager::Partition*, std::allocator<mongo::LockManager::Partition*> >> = {
      _M_impl = {
        <std::allocator<mongo::LockManager::Partition*>> = {
          <__gnu_cxx::new_allocator<mongo::LockManager::Partition*>> = {<No data fields>}, <No data fields>},
        members of std::_Vector_base<mongo::LockManager::Partition*, std::allocator<mongo::LockManager::Partition*> >::_Vector_impl:
        _M_start = 0x1c68a1f00,
        _M_finish = 0x1c68a1f00,
        _M_end_of_storage = 0x1c68a2000
      }
    }, <No data fields>},
  conversionsCount = 0,
  compatibleFirstCount = 0
}

Comment by xihui he [ 21/Sep/16 ]

From bt, it seems this time (Sep 20) is different from last time (Sep 12)

For Sep 20 hang, it seems:
1. thread rsSync 1875 holds the global W lock
2. thread "repl writer worker" 1857 need the global W lock
3. thread rsSync 1875 waits thread "repl writer worker" 1857 to finish

Thread 1875 (Thread 0x7ff2e7924700 (LWP 101580)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001b45bdc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
#2  0x00000000012b6d93 in mongo::ThreadPool::waitForIdle() ()
#3  0x0000000000f4b831 in void mongo::ScopeGuardImplBase::SafeExecute<mongo::ScopeGuardImpl0<mongo::repl::SyncTail::multiApply(mongo::OperationContext*, mongo::repl::SyncTail::OpQueue const&, boost::optional<mongo::repl::BatchBoundaries>)::{lambda()#1}> >(mongo::ScopeGuardImpl0<mongo::repl::SyncTail::multiApply(mongo::OperationContext*, mongo::repl::SyncTail::OpQueue const&, boost::optional<mongo::repl::BatchBoundaries>)::{lambda()#1}>&) ()
#4  0x0000000000f4fecd in mongo::repl::SyncTail::multiApply(mongo::OperationContext*, mongo::repl::SyncTail::OpQueue const&, boost::optional<mongo::repl::BatchBoundaries>) ()
#5  0x0000000000f51bfa in mongo::repl::SyncTail::oplogApplication() ()
#6  0x0000000000f47940 in mongo::repl::runSyncThread() ()
#7  0x0000000001b47040 in execute_native_thread_routine ()
#8  0x00007ff2f2e710a4 in start_thread (arg=0x7ff2e7924700) at pthread_create.c:309
#9  0x00007ff2f2ba687d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 1857 (Thread 0x7ff2c5844700 (LWP 179769)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000bba269 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
#2  0x0000000000bbdd37 in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
#3  0x0000000000bb41b9 in mongo::Lock::GlobalLock::waitForLock(unsigned int) ()
#4  0x0000000000bb41e5 in mongo::Lock::GlobalLock::GlobalLock(mongo::Locker*, mongo::LockMode, unsigned int) ()
#5  0x00000000009b9689 in mongo::Lock::GlobalWrite::GlobalWrite(mongo::Locker*, unsigned int) ()
#6  0x0000000000f4e35b in mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool, std::function<mongo::Status (mongo::OperationContext*, mongo::Database*, mongo::BSONObj const&, bool)>, std::function<mongo::Status (mongo::OperationContext*, mongo::BSONObj const&)>, std::function<void ()>) ()
#7  0x0000000000f4ecf6 in mongo::repl::SyncTail::syncApply(mongo::OperationContext*, mongo::BSONObj const&, bool) ()
#8  0x0000000000f5200b in mongo::repl::multiSyncApply(std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> > const&, mongo::repl::SyncTail*) ()
#9  0x00000000012b6561 in mongo::ThreadPool::_doOneTask(std::unique_lock<std::mutex>*) ()
#10 0x00000000012b6ec9 in mongo::ThreadPool::_consumeTasks() ()
#11 0x00000000012b7a20 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
#12 0x0000000001b47040 in execute_native_thread_routine ()
#13 0x00007ff2f2e710a4 in start_thread (arg=0x7ff2c5844700) at pthread_create.c:309
#14 0x00007ff2f2ba687d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment by xiaost [ 20/Sep/16 ]

we gotcha ! the currentOp output. seems rsSync holding the global W lock

Comment by xihui he [ 14/Sep/16 ]

Below is the LockHead which blocks all requests.

(gdb) p $18
$38 = {
  resourceId = {
    _fullHash = 2305843009213693953
  },
  grantedList = {
    _front = 0x0,
    _back = 0x0
  },
  grantedCounts = {0, 1, 0, 0, 0},
  grantedModes = 2,
  conflictList = {
    _front = 0x7f08028,
    _back = 0x5902cf628
  },
  conflictCounts = {0, 1490, 0, 0, 1},
  conflictModes = 18,
  partitions = {
    <std::_Vector_base<mongo::LockManager::Partition*, std::allocator<mongo::LockManager::Partition*> >> = {
      _M_impl = {
        <std::allocator<mongo::LockManager::Partition*>> = {
          <__gnu_cxx::new_allocator<mongo::LockManager::Partition*>> = {<No data fields>}, <No data fields>},
        members of std::_Vector_base<mongo::LockManager::Partition*, std::allocator<mongo::LockManager::Partition*> >::_Vector_impl:
        _M_start = 0x7470640,
        _M_finish = 0x7470640,
        _M_end_of_storage = 0x7470680
      }
    }, <No data fields>},
  conversionsCount = 0,
  compatibleFirstCount = 0
}

Comment by xiaost [ 13/Sep/16 ]

uploaded gdb bt output which is more readable

Comment by xiaost [ 12/Sep/16 ]

kaloian.manassiev

>> Can you tell me exactly which distribution of mongodb 3.2.9 are you running (i.e., platform/OS and OS version)?

package : mongodb-linux-x86_64-debian81-3.2.9

lsb_release -a

Distributor ID: Debian
Description: Debian GNU/Linux 8.5 (jessie)
Release: 8.5
Codename: jessie

>> Can you confirm whether all stacks in the attached list are from the same mongod process?

yes.

>> It would be very helpful if we had a list of the operations, which were running at the time of the hang. If it occurs again, would it be possible to run the db.currentOp(true) command and attach the output?

I would like to, but once mongodb hangs I can do nothing.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>

the issue reproduce again yesterday, I had dump memory to file. What can I do next to help debugging?

Comment by Teemu Sirkiä [ 08/Sep/16 ]

Could this be related to this https://jira.mongodb.org/browse/SERVER-25700 bug? Did you notice high CPU usage after this issue occurred?

Comment by Kaloian Manassiev [ 08/Sep/16 ]

Hi xiaost,

Thank you for reporting this issue and for attaching the thread stacks. I have a couple of follow-up questions to help us track it:

  • Can you tell me exactly which distribution of mongodb 3.2.9 are you running (i.e., platform/OS and OS version)?
  • Can you confirm whether all stacks in the attached list are from the same mongod process?
  • It would be very helpful if we had a list of the operations, which were running at the time of the hang. If it occurs again, would it be possible to run the db.currentOp(true) command and attach the output?

Thank you in advance.

-Kal.

Generated at Thu Feb 08 04:10:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.