[SERVER-44981] Primary replica set node exited abnormally beacuse of InterruptedDueToReplStateChange Created: 06/Dec/19  Updated: 23/Dec/19  Resolved: 23/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.0.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: 肖 刘 Assignee: Dmitry Agranat
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File shard3_error.log     Text File shard3_scondary_node.log    
Backwards Compatibility: Fully Compatible
Participants:

 Description   

i have a cluster of two meachine , the primary and second node of replica set locate in different meachine. Mongodb version is V4.0.2, OS is  Ubuntu 16.04.6 LTS. One primary replica set node  exited abnormally and the secondy node failed to upgrade as primary node. The log of  primary replica set node exited as below:

2019-12-05T19:13:21.385+0800 I NETWORK  [conn595694] end connection 192.168.0.87:39780 (365 connections now open)
2019-12-05T19:13:21.385+0800 I NETWORK  [conn595693] end connection 192.168.0.87:39778 (364 connections now open)
2019-12-05T19:13:21.386+0800 I NETWORK  [listener] connection accepted from 192.168.0.87:39824 #595717 (365 connections now open)
2019-12-05T19:13:21.386+0800 I NETWORK  [listener] connection accepted from 192.168.0.87:39826 #595718 (366 connections now open)
2019-12-05T19:13:22.241+0800 I NETWORK  [listener] connection accepted from 192.168.0.87:39828 #595719 (367 connections now open)
2019-12-05T19:13:22.252+0800 W NETWORK  [replSetDistLockPinger] Unable to reach primary for set cfgset
2019-12-05T19:13:22.256+0800 W NETWORK  [thread589197] Unable to reach primary for set shard1
2019-12-05T19:13:22.272+0800 I NETWORK  [conn595719] received client metadata from 192.168.0.87:39828 conn595719: { driver: { name: "MongoDB Internal Client", version: "4.0.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2019-12-05T19:13:22.304+0800 I NETWORK  [conn595718] received client metadata from 192.168.0.87:39826 conn595718: { driver: { name: "NetworkInterfaceTL", version: "4.0.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2019-12-05T19:13:22.304+0800 I NETWORK  [conn595717] received client metadata from 192.168.0.87:39824 conn595717: { driver: { name: "NetworkInterfaceTL", version: "4.0.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2019-12-05T19:13:22.315+0800 F -        [conn595664] DBException::toString(): InterruptedDueToReplStateChange: operation was interrupted
Actual exception type: mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)11602, mongo::ExceptionForCat<(mongo::ErrorCategory)1>, mongo::ExceptionForCat<(mongo::ErrorCategory)2> >
 0x562b35cd6ea1 0x562b35cd6885 0x562b35dcb1f6 0x562b35dcb241 0x562b3530d5ed 0x562b34b519bd 0x562b34ab831f 0x562b34925f3f 0x562b3493f039 0x562b34934794 0x562b357483f9 0x562b343cc3df 0x562b343ce60f 0x562b343d0979 0x562b343d18b1 0x562b343beefa 0x562b343c9c6a 0x562b343c4937 0x562b343c8141 0x562b3557b902 0x562b343c2b4f 0x562b343c5ce5 0x562b343c4077 0x562b343c49bd 0x562b343c8141 0x562b3557be65 0x562b35c30724 0x7f5db3b456ba 0x7f5db387b41d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"562B3391E000","o":"23B8EA1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"562B3391E000","o":"23B8885"},{"b":"562B3391E000","o":"24AD1F6","s":"_ZN10__cxxabiv111__terminateEPFvvE"},{"b":"562B3391E000","o":"24AD241"},{"b":"562B3391E000","o":"19EF5ED"},{"b":"562B3391E000","o":"12339BD","s":"_ZN5mongo12PlanExecutor7disposeEPNS_16OperationContextEPNS_13CursorManagerE"},{"b":"562B3391E000","o":"119A31F","s":"_ZN5mongo15ClientCursorPin16deleteUnderlyingEv"},{"b":"562B3391E000","o":"1007F3F","s":"_ZN5mongo18ScopeGuardImplBase11SafeExecuteINS_18ObjScopeGuardImpl0INS_15ClientCursorPinEMS3_FvvEEEEEvRT_"},{"b":"562B3391E000","o":"1021039","s":"_ZN5mongo12runAggregateEPNS_16OperationContextERKNS_15NamespaceStringERKNS_18AggregationRequestERKNS_7BSONObjERNS_14BSONObjBuilderE"},{"b":"562B3391E000","o":"1016794"},{"b":"562B3391E000","o":"1E2A3F9","s":"_ZN5mongo12BasicCommand10Invocation3runEPNS_16OperationContextEPNS_19CommandReplyBuilderE"},{"b":"562B3391E000","o":"AAE3DF"},{"b":"562B3391E000","o":"AB060F"},{"b":"562B3391E000","o":"AB2979"},{"b":"562B3391E000","o":"AB38B1","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE"},{"b":"562B3391E000","o":"AA0EFA","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE"},{"b":"562B3391E000","o":"AABC6A","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE"},{"b":"562B3391E000","o":"AA6937","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"562B3391E000","o":"AAA141"},{"b":"562B3391E000","o":"1C5D902","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE"},{"b":"562B3391E000","o":"AA4B4F","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE"},{"b":"562B3391E000","o":"AA7CE5","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE"},{"b":"562B3391E000","o":"AA6077","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE"},{"b":"562B3391E000","o":"AA69BD","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"562B3391E000","o":"AAA141"},{"b":"562B3391E000","o":"1C5DE65"},{"b":"562B3391E000","o":"2312724"},{"b":"7F5DB3B3E000","o":"76BA"},{"b":"7F5DB3774000","o":"10741D","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.0.2", "gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-154-generic", "version" : "#181-Ubuntu SMP Tue Jun 25 05:29:03 UTC 2019", "machine" : "x86_64" }, "somap" : [ { "b" : "562B3391E000", "elfType" : 3, "buildId" : "AAF90A0B71749BBE7B9E3CF14359342D6DCFF400" }, { "b" : "7FFE5B5D8000", "elfType" : 3, "buildId" : "38E1E50A2E0C877AA491A9F5FE2BCACD159BD8EC" }, { "b" : "7F5DB4F4F000", "path" : "/usr/lib/x86_64-linux-gnu/libcurl.so.4", "elfType" : 3, "buildId" : "5C1A06A89F89E1ADAAA507BC5580C0A7931B0AB2" }, { "b" : "7F5DB4D34000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "50A923F8DAFECBCD969C8573116A38C18D0E24D5" }, { "b" : "7F5DB48EF000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "15FFEB43278726B025F020862BF51302822A40EC" }, { "b" : "7F5DB4686000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF69EA60EBE05F2DD689D2B26FC85A73E5FBC3A0" }, { "b" : "7F5DB4482000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "37BFC3D8F7E3B022DAC7943B1A5FACD40CEBF0AD" }, { "b" : "7F5DB427A000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "69143E8B39040C964D3958490535322675F15DD3" }, { "b" : "7F5DB3F71000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "BAD67A84E56E73D031AE507261DA066B35949D34" }, { "b" : "7F5DB3D5B000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7F5DB3B3E000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "B17C21299099640A6D863E423D99265824E7BB16" }, { "b" : "7F5DB3774000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "1CA54A6E0D76188105B12E49FE6B8019BF08803A" }, { "b" : "7F5DB51BE000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "C0ADBAD6F9A33944F2B3567C078EC472A1DAE98E" }, { "b" : "7F5DB3541000", "path" : "/usr/lib/x86_64-linux-gnu/libidn.so.11", "elfType" : 3, "buildId" : "E09D3783AD1D0BBCD3204FA01E4EF6D756E18F57" }, { "b" : "7F5DB3325000", "path" : "/usr/lib/x86_64-linux-gnu/librtmp.so.1", "elfType" : 3, "buildId" : "8D1CC1204D6B6D33BD1D2C5A2A0516A2234322CF" }, { "b" : "7F5DB30DB000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "41971A4A3CCDC54A447F41DF4BD96C948C546E0E" }, { "b" : "7F5DB2ECC000", "path" : "/usr/lib/x86_64-linux-gnu/liblber-2.4.so.2", "elfType" : 3, "buildId" : "8E613D0B8D8E3537785637424782BE8502ABABD2" }, { "b" : "7F5DB2C7B000", "path" : "/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2", "elfType" : 3, "buildId" : "3890D33727391E4A85DC0F819AB0AA29BB5DFC86" }, { "b" : "7F5DB2A61000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "8D9BD4CE26E45EF16075C67D5F5EEAFD8B562832" }, { "b" : "7F5DB2731000", "path" : "/usr/lib/x86_64-linux-gnu/libgnutls.so.30", "elfType" : 3, "buildId" : "17285B5F2BCC671E0A7FA3E29CCD143509B648CD" }, { "b" : "7F5DB24FE000", "path" : "/usr/lib/x86_64-linux-gnu/libhogweed.so.4", "elfType" : 3, "buildId" : "B11678F560199547DCF726384EA39153EE0DFABF" }, { "b" : "7F5DB22C8000", "path" : "/usr/lib/x86_64-linux-gnu/libnettle.so.6", "elfType" : 3, "buildId" : "D6B36C5A463EE0FA84FDD6D5FD3F7726EDB90D54" }, { "b" : "7F5DB2048000", "path" : "/usr/lib/x86_64-linux-gnu/libgmp.so.10", "elfType" : 3, "buildId" : "7B3533D5998D20EE1A1BE3F87789B69041E7F620" }, { "b" : "7F5DB1D76000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.3", "elfType" : 3, "buildId" : "0EEF7058B0737B68BDF89E5DC604D0AC389C8BB1" }, { "b" : "7F5DB1B47000", "path" : "/usr/lib/x86_64-linux-gnu/libk5crypto.so.3", "elfType" : 3, "buildId" : "FFBA483A43D9EF73925AC116811890C037523DA1" }, { "b" : "7F5DB1943000", "path" : "/lib/x86_64-linux-gnu/libcom_err.so.2", "elfType" : 3, "buildId" : "1E16CB57F699E215A2A8D4EFEF90883BC749B12D" }, { "b" : "7F5DB1738000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5support.so.0", "elfType" : 3, "buildId" : "B789D8D4B4FC333405AB34387D9237F954060EA4" }, { "b" : "7F5DB151D000", "path" : "/usr/lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "87783DF8A1058CD150F8886CB36340384093C18F" }, { "b" : "7F5DB12DC000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi.so.3", "elfType" : 3, "buildId" : "1FE877BE52A424D0636AFD4D35BB330E41D6E0F3" }, { "b" : "7F5DB1078000", "path" : "/usr/lib/x86_64-linux-gnu/libp11-kit.so.0", "elfType" : 3, "buildId" : "A0E2D03FF5CF65937F4425D4EFD4D655243809EB" }, { "b" : "7F5DB0E65000", "path" : "/usr/lib/x86_64-linux-gnu/libtasn1.so.6", "elfType" : 3, "buildId" : "E07E186694852D8F69459C6AB28A53F8DA3CE3B6" }, { "b" : "7F5DB0C61000", "path" : "/lib/x86_64-linux-gnu/libkeyutils.so.1", "elfType" : 3, "buildId" : "3364D4BF2113C4E8D17EF533867ECC99A53413D6" }, { "b" : "7F5DB0A58000", "path" : "/usr/lib/x86_64-linux-gnu/libheimntlm.so.0", "elfType" : 3, "buildId" : "73A8EADBC85860662B24850E71D4AFBE22C33359" }, { "b" : "7F5DB07CE000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.26", "elfType" : 3, "buildId" : "59E742306A4EA2872E061ECCE92F35FADDA75357" }, { "b" : "7F5DB052C000", "path" : "/usr/lib/x86_64-linux-gnu/libasn1.so.8", "elfType" : 3, "buildId" : "E5C159E415406AE79D21056D752BA949C408B5B1" }, { "b" : "7F5DB02F9000", "path" : "/usr/lib/x86_64-linux-gnu/libhcrypto.so.4", "elfType" : 3, "buildId" : "7D15576E1F096614D360784E4A01A1F5FAF908C9" }, { "b" : "7F5DB00E3000", "path" : "/usr/lib/x86_64-linux-gnu/libroken.so.18", "elfType" : 3, "buildId" : "481DB33C28D88E43DA6BED65E1A7599407D4D818" }, { "b" : "7F5DAFEDB000", "path" : "/usr/lib/x86_64-linux-gnu/libffi.so.6", "elfType" : 3, "buildId" : "9D9C958F1F4894AFEF6AECD90D1C430EA29AC34F" }, { "b" : "7F5DAFCB2000", "path" : "/usr/lib/x86_64-linux-gnu/libwind.so.0", "elfType" : 3, "buildId" : "57E25072866B2D30CF02EBE7AE623B84F96FA700" }, { "b" : "7F5DAFAA3000", "path" : "/usr/lib/x86_64-linux-gnu/libheimbase.so.1", "elfType" : 3, "buildId" : "F6F1B4E9F89B716C4A0BA5819BDFFAF4A13EFB91" }, { "b" : "7F5DAF858000", "path" : "/usr/lib/x86_64-linux-gnu/libhx509.so.5", "elfType" : 3, "buildId" : "C60082E3BB78D0D42868D9B359B89BF66CE5A1A7" }, { "b" : "7F5DAF583000", "path" : "/usr/lib/x86_64-linux-gnu/libsqlite3.so.0", "elfType" : 3, "buildId" : "3B0454E57467057071F7AD49651E0FA7B01CF5C7" }, { "b" : "7F5DAF34B000", "path" : "/lib/x86_64-linux-gnu/libcrypt.so.1", "elfType" : 3, "buildId" : "FD61CA7A6D603E94E5EFD5C88D8810AE104FCF40" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x562b35cd6ea1]
 mongod(+0x23B8885) [0x562b35cd6885]
 mongod(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x562b35dcb1f6]
 mongod(+0x24AD241) [0x562b35dcb241]
 mongod(+0x19EF5ED) [0x562b3530d5ed]
 mongod(_ZN5mongo12PlanExecutor7disposeEPNS_16OperationContextEPNS_13CursorManagerE+0x6D) [0x562b34b519bd]
 mongod(_ZN5mongo15ClientCursorPin16deleteUnderlyingEv+0x9F) [0x562b34ab831f]
 mongod(_ZN5mongo18ScopeGuardImplBase11SafeExecuteINS_18ObjScopeGuardImpl0INS_15ClientCursorPinEMS3_FvvEEEEEvRT_+0x1F) [0x562b34925f3f]
 mongod(_ZN5mongo12runAggregateEPNS_16OperationContextERKNS_15NamespaceStringERKNS_18AggregationRequestERKNS_7BSONObjERNS_14BSONObjBuilderE+0x2629) [0x562b3493f039]
 mongod(+0x1016794) [0x562b34934794]
 mongod(_ZN5mongo12BasicCommand10Invocation3runEPNS_16OperationContextEPNS_19CommandReplyBuilderE+0xD9) [0x562b357483f9]
 mongod(+0xAAE3DF) [0x562b343cc3df]
 mongod(+0xAB060F) [0x562b343ce60f]
 mongod(+0xAB2979) [0x562b343d0979]
 mongod(_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE+0x3C1) [0x562b343d18b1]
 mongod(_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x3A) [0x562b343beefa]
 mongod(_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE+0xBA) [0x562b343c9c6a]
 mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x97) [0x562b343c4937]
 mongod(+0xAAA141) [0x562b343c8141]
 mongod(_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE+0x1A2) [0x562b3557b902]
 mongod(_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE+0x15F) [0x562b343c2b4f]
 mongod(_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE+0xAF5) [0x562b343c5ce5]
 mongod(_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE+0x357) [0x562b343c4077]
 mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x11D) [0x562b343c49bd]
 mongod(+0xAAA141) [0x562b343c8141]
 mongod(+0x1C5DE65) [0x562b3557be65]
 mongod(+0x2312724) [0x562b35c30724]
 libpthread.so.0(+0x76BA) [0x7f5db3b456ba]
 libc.so.6(clone+0x6D) [0x7f5db387b41d]
-----  END BACKTRACE  -----

I have searched the issue list , is my problem related to  SERVER-34661 and how  can i prevent the InterruptedDueToReplStateChange error  happening , beacuse the cluster cannot provide services if the is no primary node in replica set .



 Comments   
Comment by Dmitry Agranat [ 23/Dec/19 ]

Hi xiaoliuhust@gmail.com,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Dmitry Agranat [ 08/Dec/19 ]

Hi xiaoliuhust@gmail.com,

The information provided is not enough to determine the root cause of the issue.

The previously requested rs.status() can be retrieved from the secondary by executing rs.slaveOK() and then rs.status().

In addition, please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) from all members of the replica set in question and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Comment by 肖 刘 [ 06/Dec/19 ]

I attach  a new log of secondary node to give you more infomation.  For better understanding of the log ,notice that each meachine of the cluster has  five mongo process, mongos runs on port 27010, cfg_server runs on port 27011,shard1 run on port 27001,shard2 runs on port 27002,shard3 runs on port 27003. The primary node of shard3 on 192.168.0.87 meachine exited abnormally, and the shard3's secondary node is on 192.168.0.178 meachine. If you need more info please tell me.

Comment by 肖 刘 [ 06/Dec/19 ]

Sorry, i don't run  rs.status() command at that time. It was happened yesterday after work and i need to recover cluster quickly.  When the error occurred i login in  the second replica set  with mongo --port 27003 command , but the role reamins secondary. So i restart the primary node and forget to capture more info. I can try to find some log on the secondary node. 

Comment by Dmitry Agranat [ 06/Dec/19 ]

Hi xiaoliuhust@gmail.com,

Thanks for the report. Could you also attach the output from the rs.status() command?

Thanks,
Dima

Generated at Thu Feb 08 05:07:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.