[SERVER-51488] Complete Replica set shutdown | v3.6.14 Created: 11/Oct/20  Updated: 29/Oct/20  Resolved: 29/Oct/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.14
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Anirudh Bhardwaj Assignee: Dmitry Agranat
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Red Hat Enterprise Linux Server release 7.7 (Maipo)


Participants:

 Description   

We are running multiple clusters of MongoDB. 2 of our clusters went down completely with segmentation fault error that seems to have got fixed in SERVER-29850

But the same is getting on both 3.6.14 cluster and problem is that all nodes get down once affected. Below is the error log.

2020-10-11T20:10:48.850+0530 E QUERY    [conn5256811] Plan executor error during find command: DEAD, stats: \{ stage: "COLLSCAN", filter: { ts: { $gte: Timestamp(1602426461, 460) } }, nReturned: 0, executionTimeMillisEstimate: 0, works: 1, advanced: 0, needTime: 0, needYield: 1, saveState: 0, restoreState: 0, isEOF: 0, invalidates: 0, direction: "forward", docsExamined: 0 }
 
2020-10-11T20:10:48.850+0530 I COMMAND  [conn5256811] command local.oplog.rs command: find \{ find: "oplog.rs", filter: { ts: { $gte: Timestamp(1602426461, 460) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 93, readConcern: \{ afterOpTime: { ts: Timestamp(1602426461, 460), t: 92 } }, $replData: 1, $oplogQueryData: 1, $readPreference: \{ mode: "secondaryPreferred" }, $clusterTime: \{ clusterTime: Timestamp(1602427115, 1), signature: { hash: BinData(0, EEB32C4E20AC7177456190D2E7902DC70774076E), keyId: 6820092145433575425 } }, $db: "local" } planSummary: COLLSCAN writeConflicts:1 numYields:0 reslen:657 locks:\{ Global: { acquireCount: { r: 4 }, acquireWaitCount: \{ r: 2 }, timeAcquiringMicros: \{ r: 3837179 } }, Database: \{ acquireCount: { r: 2 } }, oplog: \{ acquireCount: { r: 2 } } } protocol:op_msg 3838ms
 
2020-10-11T20:10:48.851+0530 E STORAGE  [conn5256776] WiredTiger error (2) [1602427248:851573][1828:0x7fa83d368700], file:collection-57--7767596474170028009.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /data/mongo/collection-57--7767596474170028009.wt: handle-open: open: No such file or directory
 
2020-10-11T20:10:48.851+0530 E STORAGE  [conn5256776] no cursor for uri: table:collection-57--7767596474170028009
 
2020-10-11T20:10:48.851+0530 F -        [conn5256776] Invalid access at address: 0x50
 
2020-10-11T20:10:48.893+0530 F -        [conn5256776] Got signal: 11 (Segmentation fault).
 
 
 
 0x563bdbd206a1 0x563bdbd1f8b9 0x563bdbd1ff26 0x7fa84d188630 0x563bda4f22fd 0x563bdacb0bb4 0x563bdace025b 0x563bdaaab72e 0x563bdaaac25b 0x563bda73788a 0x563bdb637256 0x563bdb6326bf 0x563bda6d932b 0x563bda6db14c 0x563bda6dbfa4 0x563bda6ea02a 0x563bda6e5987 0x563bda6e8e11 0x563bdb60b292 0x563bda6e47c0 0x563bda6e6d55 0x563bda6e7651 0x563bda6e5a0d 0x563bda6e8e11 0x563bdb60b7f5 0x563bdbbda274 0x7fa84d180ea5 0x7fa84cea98cd
 
----- *BEGIN* BACKTRACE -----
 
{"backtrace":[\{"b":"563BD9A9B000","o":"22856A1","s":"_ZN5mongo15printStackTraceERSo"},\{"b":"563BD9A9B000","o":"22848B9"},\{"b":"563BD9A9B000","o":"2284F26"},\{"b":"7FA84D179000","o":"F630"},\{"b":"563BD9A9B000","o":"A572FD","s":"_ZN5mongo31WiredTigerRecordStoreCursorBase4nextEv"},\{"b":"563BD9A9B000","o":"1215BB4","s":"_ZN5mongo14CollectionScan6doWorkEPm"},\{"b":"563BD9A9B000","o":"124525B","s":"_ZN5mongo9PlanStage4workEPm"},\{"b":"563BD9A9B000","o":"101072E","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},\{"b":"563BD9A9B000","o":"101125B","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},\{"b":"563BD9A9B000","o":"C9C88A"},\{"b":"563BD9A9B000","o":"1B9C256","s":"_ZN5mongo12BasicCommand11enhancedRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE"},\{"b":"563BD9A9B000","o":"1B976BF","s":"_ZN5mongo7Command9publicRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE"},\{"b":"563BD9A9B000","o":"C3E32B"},\{"b":"563BD9A9B000","o":"C4014C"},\{"b":"563BD9A9B000","o":"C40FA4","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE"},\{"b":"563BD9A9B000","o":"C4F02A","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE"},\{"b":"563BD9A9B000","o":"C4A987","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},\{"b":"563BD9A9B000","o":"C4DE11"},\{"b":"563BD9A9B000","o":"1B70292","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE"},\{"b":"563BD9A9B000","o":"C497C0","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE"},\{"b":"563BD9A9B000","o":"C4BD55","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE"},\{"b":"563BD9A9B000","o":"C4C651","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE"},\{"b":"563BD9A9B000","o":"C4AA0D","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},\{"b":"563BD9A9B000","o":"C4DE11"},\{"b":"563BD9A9B000","o":"1B707F5"},\{"b":"563BD9A9B000","o":"213F274"},\{"b":"7FA84D179000","o":"7EA5"},\{"b":"7FA84CDAB000","o":"FE8CD","s":"clone"}],"processInfo":\{ "mongodbVersion" : "3.6.14", "gitVersion" : "cbef87692475857c7ee6e764c8f5104b39c342a1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-1062.1.2.el7.x86_64", "version" : "#1 SMP Mon Sep 16 14:19:51 EDT 2019", "machine" : "x86_64" }, "somap" : [ \{ "b" : "563BD9A9B000", "elfType" : 3, "buildId" : "242C773CCBE0730C37DE5B7435D633C705F5274D" }, \{ "b" : "7FFF62FE4000", "elfType" : 3, "buildId" : "086694BE6E21A0A8272FB65847371DAF85F23422" }, \{ "b" : "7FA84E38E000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6AE7534DD2B3C41A984BA43D85A2B4FBA378FB98" }, \{ "b" : "7FA84DF2B000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "4A7C42F51D767226113C14433CF47B7FE2034FC5" }, \{ "b" : "7FA84DCB9000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "F3223C4C9DD8824C897E96D992DB8BA55C5A755C" }, \{ "b" : "7FA84DAB5000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "B16FC9C912150101DEA3E14E5FCFD9E9F71E5A45" }, \{ "b" : "7FA84D8AD000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "85B4DD66FF1213DC51BF382E9D162E7C2C9B73B6" }, \{ "b" : "7FA84D5AB000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "D1F552CF3C05D7E7394EDDA2F1F377A553D593F8" }, \{ "b" : "7FA84D395000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "DAC0179F4555AEFEC9E97476201802FD20C03EC5" }, \{ "b" : "7FA84D179000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "FEBFED597867C1FC05ACE1B5FC7DB5AC93364C2E" }, \{ "b" : "7FA84CDAB000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "D6B09772D17878A32EDD32EA18751209EE9BE5A7" }, \{ "b" : "7FA84E5A7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "90493AE08BD5E200887971C5DB8D18E97B68A878" }, \{ "b" : "7FA84CB95000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "B9D5F73428BD6AD68C96986B57BEA3B7CEDB9745" }, \{ "b" : "7FA84C948000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "BCC30853830CD911E58700591830DF51ABCBD7BA" }, \{ "b" : "7FA84C65F000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "B64919F53B93FF41BFCCF022042E454012B2CD20" }, \{ "b" : "7FA84C45B000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "E4C7298B74FEEADC4DDE40CDD8C4D6B85FE09ADE" }, \{ "b" : "7FA84C228000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "A9B3906192687CC45D483AE3C58C8AF745A6726A" }, \{ "b" : "7FA84C018000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "94B3BCB669126166B77CDCE6092679A6AA2004C8" }, \{ "b" : "7FA84BE14000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8CA73C16CFEB9A8B5660015B9223B09F87041CAD" }, \{ "b" : "7FA84BBED000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "D2DD4DA3FDE1477D25BFFF80F3A25FDB541A8179" }, \{ "b" : "7FA84B98B000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "F5B144F9F5D9BE451C80211B34DB2CE348E039B6" } ] }}
 
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x563bdbd206a1]
 
 mongod(+0x22848B9) [0x563bdbd1f8b9]
 
 mongod(+0x2284F26) [0x563bdbd1ff26]
 
 libpthread.so.0(+0xF630) [0x7fa84d188630]
 
 mongod(_ZN5mongo31WiredTigerRecordStoreCursorBase4nextEv+0x22D) [0x563bda4f22fd]
 
 mongod(_ZN5mongo14CollectionScan6doWorkEPm+0xD4) [0x563bdacb0bb4]
 
 mongod(_ZN5mongo9PlanStage4workEPm+0x6B) [0x563bdace025b]
 
 mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x43E) [0x563bdaaab72e]
 
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x4B) [0x563bdaaac25b]
 
 mongod(+0xC9C88A) [0x563bda73788a]
 
 mongod(_ZN5mongo12BasicCommand11enhancedRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE+0x76) [0x563bdb637256]
 
 mongod(_ZN5mongo7Command9publicRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE+0x1F) [0x563bdb6326bf]
 
 mongod(+0xC3E32B) [0x563bda6d932b]
 
 mongod(+0xC4014C) [0x563bda6db14c]
 
 mongod(_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x2B4) [0x563bda6dbfa4]
 
 mongod(_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE+0xBA) [0x563bda6ea02a]
 
 mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x97) [0x563bda6e5987]
 
 mongod(+0xC4DE11) [0x563bda6e8e11]
 
 mongod(_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE+0x1A2) [0x563bdb60b292]
 
 mongod(_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE+0x150) [0x563bda6e47c0]
 
 mongod(_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE+0xB05) [0x563bda6e6d55]
 
 mongod(_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE+0x241) [0x563bda6e7651]
 
 mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x11D) [0x563bda6e5a0d]
 
 mongod(+0xC4DE11) [0x563bda6e8e11]
 
 mongod(+0x1B707F5) [0x563bdb60b7f5]
 
 mongod(+0x213F274) [0x563bdbbda274]
 
 libpthread.so.0(+0x7EA5) [0x7fa84d180ea5]
 
 libc.so.6(clone+0x6D) [0x7fa84cea98cd]



 Comments   
Comment by Dmitry Agranat [ 29/Oct/20 ]

Hi anirudh.bhardwaj@airtel.com,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Dmitry Agranat [ 21/Oct/20 ]

Hi anirudh.bhardwaj@airtel.com,

Hi,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the data mentioned in my last comment?

Thanks,
Dima

Comment by Dmitry Agranat [ 12/Oct/20 ]

Hi anirudh.bhardwaj@airtel.com,

In order for us to investigate this issue, we'll need some additional information. I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

  • archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) from all members of the replica set
  • /var/log/messages, the output from /var/log/dmesg and /var/log/syslog
  • what is the underlying platform (virtual machine, container, native hardware, etc)?
  • is data storage locally attached or network-attached?

Please make a complete copy of the database's $dbpath directory as a safeguard.

Thanks,
Dima

Generated at Thu Feb 08 05:25:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.