[SERVER-30707] server crash: invariant() failure Created: 16/Aug/17  Updated: 09/Oct/17  Resolved: 13/Sep/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Vlad Zloteanu Assignee: Ramon Fernandez Marina
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-30710 Several seconds for indexed query - g... Closed
Operating System: ALL
Participants:

 Description   

OS: Debian 9
Mongodb version: 3.0.15
Engine: WiredTiger

I'm importing some large databases into a standard mongodb setup: primary + replica + voter.
The resident memory of the process grows until it takes 50% (but I suppose this is normal behaviour).
At some point, the mongod process crashes.
Soon after (2-3 minutes), the replica also crashes.

After restarting the process, everything goes back to normal.

Log contents:

2017-08-17T00:41:06.928+0200 I INDEX    [repl writer worker 0] build index on: REDACTED.dlv_event_list properties: { v: 1, key: { dlvev: 1, code: 1, attrs.pid: 1, date: -1 }, name: "dlvev_1_code_1_attrs_pid_1_date_-1", ns: "REDACTED" }
{"backtrace":[{"b":"400000","o":"BC7182","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"B64709","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"B4A640","s":"_ZN5mongo17in
variantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"400000","o":"9E20A4","s":"_ZN5mongo17WiredTigerSession9getCursorERKSsmb"},{"b":"400000","o":"9DE150","s":"_ZN5mongo16WiredTigerCursorC1ERKS
smbPNS_16OperationContextE"},{"b":"400000","o":"9CB468","s":"_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEi"},{"b":"400000","o":"6D5089","s":"_ZNK5mongo22BtreeB
asedAccessMethod9newCursorEPNS_16OperationContextERKNS_13CursorOptionsEPPNS_11IndexCursorE"},{"b":"400000","o":"65C1FF","s":"_ZN5mongo9IndexScan13initIndexScanEv"},{"b":"400000","o":"6
2922861150) } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:1 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount
: { w: 3 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } 194ms
2017-08-17T00:34:21.344+0200 I COMMAND  [conn1135] command REDACTED.$cmd command: update { update: "dlv_event", updates: [ { q: { sesid: ObjectId('5994c168caf95b29668b6947'
) }, u: { $set: { dlv.lact: new Date(1502922861150) } }, multi: true, upsert: false } ], writeConcern: { w: 1 } } keyUpdates:0 writeConflicts:0 numYields:0 reslen:95 locks:{ Global: {
acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } 194ms
2017-08-17T00:34:21.344+0200 I WRITE    [conn6783] insert REDACTED.event query: { _id: ObjectId('57a08905d5bc6dc0038b5d90'), code: "viewcat", who: { agid: ObjectId('57a087d8d5bc6dc
8038b5c1d'), sesid: ObjectId('57a087d8d5bc6dc8038b5c1e'), emailid: null }, details: { bws: "Chrome 39", source: "REDACTED" }, date: new Date(1470138629435), attrs: { cid: Obje
ctId('5788a940d5bc6d56248b7492') }, flag: { usrjr_used: false } } ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { ac
quireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 422ms
2017-08-17T00:34:21.359+0200 I COMMAND  [conn6829] command REDACTED.$cmd command: aggregate { aggregate: "score_agg_prod", pipeline: [ { $match: { date: { $gte: new Date(150
2312400000) } } }, { $group: { _id: "$realid", score: { $sum: "$score" } } }, { $sort: { score: -1 } } ], cursor: {} } cursorid:140070364796006 keyUpdates:0 writeConflicts:0 numYields:
18 reslen:3907 locks:{ Global: { acquireCount: { r: 42 } }, Database: { acquireCount: { r: 21 } }, Collection: { acquireCount: { r: 21 } } } 466ms
2017-08-17T00:34:21.368+0200 I CONTROL  [conn6714]
 0xfc7182 0xf64709 0xf4a640 0xde20a4 0xdde150 0xdcb468 0xad5089 0xa5c1ff 0xa5c932 0xa4ffb1 0xa61c32 0xa62557 0xc26502 0xc26b6f 0xc26e82 0xbf7a59 0xbf35e8 0xb06a3a 0x8536c5 0xf789c9 0x7
f0e07e80494 0x7f0e06c3eaff
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"BC7182","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"B64709","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"B4A640","s":"_ZN5mongo17in
variantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"400000","o":"9E20A4","s":"_ZN5mongo17WiredTigerSession9getCursorERKSsmb"},{"b":"400000","o":"9DE150","s":"_ZN5mongo16WiredTigerCursorC1ERKS
smbPNS_16OperationContextE"},{"b":"400000","o":"9CB468","s":"_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEi"},{"b":"400000","o":"6D5089","s":"_ZNK5mongo22BtreeB
asedAccessMethod9newCursorEPNS_16OperationContextERKNS_13CursorOptionsEPPNS_11IndexCursorE"},{"b":"400000","o":"65C1FF","s":"_ZN5mongo9IndexScan13initIndexScanEv"},{"b":"400000","o":"6
5C932","s":"_ZN5mongo9IndexScan4workEPm"},{"b":"400000","o":"64FFB1","s":"_ZN5mongo10FetchStage4workEPm"},{"b":"400000","o":"661C32","s":"_ZN5mongo14MultiPlanStage12workAllPlansEmPNS_1
5PlanYieldPolicyE"},{"b":"400000","o":"662557","s":"_ZN5mongo14MultiPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE"},{"b":"400000","o":"826502","s":"_ZN5mongo12PlanExecutor12pickBestPl
anENS0_11YieldPolicyE"},{"b":"400000","o":"826B6F","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextEPNS_10WorkingSetEPNS_9PlanStageEPNS_13QuerySolutionEPNS_14CanonicalQueryEPKN
S_10CollectionERKSsNS0_11YieldPolicyEPPS0_"},{"b":"400000","o":"826E82","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextEPNS_10WorkingSetEPNS_9PlanStageEPNS_13QuerySolutionEPNS
_14CanonicalQueryEPKNS_10CollectionENS0_11YieldPolicyEPPS0_"},{"b":"400000","o":"7F7A59","s":"_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionEPNS_14CanonicalQueryENS_12P
lanExecutor11YieldPolicyEPPS6_m"},{"b":"400000","o":"7F35E8","s":"_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_"},{"b"
:"400000","o":"706A3A","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"4536C5","s":"_ZN5mongo16MyMessageH
andler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE"},{"b":"400000","o":"B789C9","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7F0E07E79000","o":"
7494"},{"b":"7F0E06B56000","o":"E8AFF","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.0.15", "gitVersion" : "b8ff507269c382bc100fc52f75f48d54cd42ec3b", "uname" : { "sysname" : "Li
nux", "release" : "4.9.0-3-amd64", "version" : "#1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "D788BC655
BE139A77A25D035A4E5205BC813369F" }, { "b" : "7FFCCCBBF000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "DE7DAC2DF9F596F46FA94A387858EF25170603EC" }, { "b" : "7F0E07E79000",
 "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "968DF33F83963B559243653D74D27D89605BED02" }, { "b" : "7F0E07C18000", "path" : "/usr/lib/x86_64-linux-gnu/
libssl.so.1.0.0", "elfType" : 3, "buildId" : "21115992A1F885E1ACE88AADA60F126AD9759D03" }, { "b" : "7F0E0781C000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" :
3, "buildId" : "32E9A5B9EED626E93DEEB00A49033F78652DB9A3" }, { "b" : "7F0E07614000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "FE41526A83999F2FE9D0F8AADC
D61D03A92CBB70" }, { "b" : "7F0E07410000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "6A5D98612129B8186F21E800AFDFAAA627082F46" }, { "b" : "7F0E0710C000",
 "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "90DA054E12EA1A53EE0CBB5BB5E65F7069AEEE44" }, { "b" : "7F0E06EF5000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.s
o.1", "elfType" : 3, "buildId" : "036DAE71A7197C847FC5B720634642B922C74398" }, { "b" : "7F0E06B56000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "79450F6E3
6287865D093EA209B85A222209925FF" }, { "b" : "7F0E08096000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F150F33B150D6A81E26A425DD47D713D00F2D29" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0xfc7182]
 mongod(_ZN5mongo10logContextEPKc+0xE9) [0xf64709]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xE0) [0xf4a640]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKSsmb+0xE4) [0xde20a4]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE+0x50) [0xdde150]
 mongod(_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEi+0x58) [0xdcb468]
 mongod(_ZNK5mongo22BtreeBasedAccessMethod9newCursorEPNS_16OperationContextERKNS_13CursorOptionsEPPNS_11IndexCursorE+0x29) [0xad5089]
 mongod(_ZN5mongo9IndexScan13initIndexScanEv+0x6F) [0xa5c1ff]
 mongod(_ZN5mongo9IndexScan4workEPm+0x52) [0xa5c932]
 mongod(_ZN5mongo10FetchStage4workEPm+0xD1) [0xa4ffb1]
 mongod(_ZN5mongo14MultiPlanStage12workAllPlansEmPNS_15PlanYieldPolicyE+0xE2) [0xa61c32]
 mongod(_ZN5mongo14MultiPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0xD7) [0xa62557]
 mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE+0x72) [0xc26502]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextEPNS_10WorkingSetEPNS_9PlanStageEPNS_13QuerySolutionEPNS_14CanonicalQueryEPKNS_10CollectionERKSsNS0_11YieldPolicyEPPS0_+0x6F) [0xc26b6f]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextEPNS_10WorkingSetEPNS_9PlanStageEPNS_13QuerySolutionEPNS_14CanonicalQueryEPKNS_10CollectionENS0_11YieldPolicyEPPS0_+0x72) [0xc26e82]
 mongod(_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionEPNS_14CanonicalQueryENS_12PlanExecutor11YieldPolicyEPPS6_m+0xC9) [0xbf7a59]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x658) [0xbf35e8]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xABA) [0xb06a3a]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xF5) [0x8536c5]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x339) [0xf789c9]
 libpthread.so.0(+0x7494) [0x7f0e07e80494]
 libc.so.6(clone+0x3F) [0x7f0e06c3eaff]
-----  END BACKTRACE  -----
2017-08-17T00:34:21.368+0200 I -        [conn6714]
 
***aborting after invariant() failure



 Comments   
Comment by Ramon Fernandez Marina [ 13/Sep/17 ]

vladzloteanu, we haven't heard back from you for some time, so I'm going to close this ticket for the time being. If after you upgrade to the latest version (3.4.9 at the time of this writing) you continue to experience issues feel free to open a new ticket.

Thanks,
Ramón.

Comment by Vlad Zloteanu [ 17/Aug/17 ]

Hello Ramón,
Unfortunately, the process of upgrading is not simple for us (we must change the driver). I'll try to speed things up.

I'll also get you another log entry.

Thanks,
Vlad

Comment by Ramon Fernandez Marina [ 17/Aug/17 ]

vladzloteanu, unfortunately the log snippet you uploaded seems to be incomplete (at least for the purpose of diagnosis).

There were some issues with index builds and WiredTiger in MongoDB 3.0, so it could be you're hitting one of those. My recommendation would be:

  • Please upgrade to MongoDB 3.4.7 and see if the issue persist
  • If it does, send us the full log of the affected node and we'll be happy to investigate.

Thanks,
Ramón.

Generated at Thu Feb 08 04:24:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.