[SERVER-30089] Arbiter crash with invariant failure i < _members.size() Created: 11/Jul/17  Updated: 21/Mar/18  Resolved: 18/Aug/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andreas Kohn Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongodb-minikube-restart-arbiter-crash-3.4.6.txt    
Issue Links:
Related
related to SERVER-28079 Secondary mongod crashes when removed... Closed
Operating System: ALL
Participants:

 Description   

I just noticed this in my log files of an arbiter in a replica set with 2 other nodes.

I'm not exactly sure which action caused this, at the time of it occuring I was doing tests that involved stopping/starting the VM that run MongoDB. The stack trace looks similar but not identical to SERVER-28079.

The arbiter (and all other members) got restarted automatically, and seemingly the setup "recovered" without additional user interaction.

Set up:
This is a MongoDB 3.4.x unsharded replica set running inside kubernetes on Minikube (i.e. in a single VirtualBox VM). The MongoDB containers are managed using custom scripts.

I'm reporting this mainly to provide a data point, given that this is a test setup it is hard to reproduce the issue in any meaningful way.



 Comments   
Comment by Kelsey Schubert [ 20/Mar/18 ]

Hi agamdua,

Thanks for the report and stacktrace. This issue appears to be duplicate of the very closely related ticket, SERVER-28079, which may also address the issue that original reporter filed. Please feel free to review, vote, and watch SERVER-28079 for updates.

Kind regards,
Kelsey

Comment by Agam Dua [ 19/Mar/18 ]

I had what looks to be the same issue:

2018-03-19T18:52:41.439+0000 I -        [replication-252] Invariant failure i < _members.size() src/mongo/db/repl/repl_set_config.cpp 619
2018-03-19T18:52:41.440+0000 I -        [replication-252]
 
***aborting after invariant() failure
 
 
2018-03-19T18:52:41.440+0000 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to <hostname:port>: NodeNotFound: This node is not in the current replset configuration.
2018-03-19T18:52:41.451+0000 F -        [replication-252] Got signal: 6 (Aborted).
 
 0x558304a1b671 0x558304a1a769 0x558304a1ac4d 0x7f6bed698390 0x7f6bed2f2428 0x7f6bed2f402a 0x558303ce06a8 0x55830446d8dc 0x55830446da39 0x55830452d66d 0x5583044b96e6 0x5583043de079 0x5583043c5f52 0x558304457c54 0x558303ddbae1 0x5583047bff6a 0x5583047c3373 0x5583047c384b 0x5583049aae35 0x5583049ab960 0x5583049ac509 0x5583054892d0 0x7f6bed68e6ba 0x7f6bed3c441d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"5583034E3000","o":"1538671","s":"_ZN5mongo15printStackTraceERSo"},{"b":"5583034E3000","o":"1537769"},{"b":"5583034E3000","o":"1537C4D"},{"b":"7F6BED687000","o":"11390"},{"b":"7F6BED2BD000","o":"35428","s":"gsignal"},{"b":"7F6BED2BD000","o":"3702A","s":"abort"},{"b":"5583034E3000","o":"7FD6A8","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"5583034E3000","o":"F8A8DC"},{"b":"5583034E3000","o":"F8AA39"},{"b":"5583034E3000","o":"104A66D","s":"_ZNK5mongo4repl23TopologyCoordinatorImpl22shouldChangeSyncSourceERKNS_11HostAndPortERKNS0_6OpTimeERKNS_3rpc15ReplSetMetadataEN5boost8optionalINS8_18OplogQueryMetadataEEENS_6Date_tE"},{"b":"5583034E3000","o":"FD66E6","s":"_ZN5mongo4repl26ReplicationCoordinatorImpl22shouldChangeSyncSourceERKNS_11HostAndPortERKNS_3rpc15ReplSetMetadataEN5boost8optionalINS5_18OplogQueryMetadataEEE"},{"b":"5583034E3000","o":"EFB079","s":"_ZN5mongo4repl31DataReplicatorExternalStateImpl18shouldStopFetchingERKNS_11HostAndPortERKNS_3rpc15ReplSetMetadataEN5boost8optionalINS5_18OplogQueryMetadataEEE"},{"b":"5583034E3000","o":"EE2F52"},{"b":"5583034E3000","o":"F74C54","s":"_ZN5mongo4repl12OplogFetcher9_callbackERKNS_10StatusWithINS_7Fetcher13QueryResponseEEEPNS_14BSONObjBuilderE"},{"b":"5583034E3000","o":"8F8AE1","s":"_ZN5mongo7Fetcher9_callbackERKNS_8executor12TaskExecutor25RemoteCommandCallbackArgsEPKc"},{"b":"5583034E3000","o":"12DCF6A"},{"b":"5583034E3000","o":"12E0373","s":"_ZN5mongo8executor22ThreadPoolTaskExecutor11runCallbackESt10shared_ptrINS1_13CallbackStateEE"},{"b":"5583034E3000","o":"12E084B"},{"b":"5583034E3000","o":"14C7E35","s":"_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE"},{"b":"5583034E3000","o":"14C8960","s":"_ZN5mongo10ThreadPool13_consumeTasksEv"},{"b":"5583034E3000","o":"14C9509","s":"_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE"},{"b":"5583034E3000","o":"1FA62D0"},{"b":"7F6BED687000","o":"76BA"},{"b":"7F6BED2BD000","o":"10741D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.4", "gitVersion" : "888390515874a9debd1b6c5d36559ca86b44babd", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-1052-aws", "version" : "#61-Ubuntu SMP Mon Feb 12 23:05:58 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "5583034E3000", "elfType" : 3, "buildId" : "93EBA2F9DA835EB5D31628B166C8A53322F507D4" }, { "b" : "7FFEF51FD000", "elfType" : 3, "buildId" : "3E988E23FE9673945EE9FEFC803204122D44B48C" }, { "b" : "7F6BEDFC7000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "89C34D7A182387D76D5CDA1F7718F5D58824DFB3" }, { "b" : "7F6BEDDC3000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "8CC8D0D119B142D839800BFF71FB71E73AEA7BD4" }, { "b" : "7F6BEDABA000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "DFB85DE42DAFFD09640C8FE377D572DE3E168920" }, { "b" : "7F6BED8A4000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7F6BED687000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "CE17E023542265FC11D9BC8F534BB4F070493D30" }, { "b" : "7F6BED2BD000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B5381A457906D279073822A5CEB24C4BFEF94DDB" }, { "b" : "7F6BEE1CF000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5D7B6259552275A3C17BD4C3FD05F5A6BF40CAA5" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x558304a1b671]
 mongod(+0x1537769) [0x558304a1a769]
 mongod(+0x1537C4D) [0x558304a1ac4d]
 libpthread.so.0(+0x11390) [0x7f6bed698390]
 libc.so.6(gsignal+0x38) [0x7f6bed2f2428]
 libc.so.6(abort+0x16A) [0x7f6bed2f402a]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x558303ce06a8]
 mongod(+0xF8A8DC) [0x55830446d8dc]
 mongod(+0xF8AA39) [0x55830446da39]
 mongod(_ZNK5mongo4repl23TopologyCoordinatorImpl22shouldChangeSyncSourceERKNS_11HostAndPortERKNS0_6OpTimeERKNS_3rpc15ReplSetMetadataEN5boost8optionalINS8_18OplogQueryMetadataEEENS_6Date_tE+0x32D) [0x55830452d66d]
 mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl22shouldChangeSyncSourceERKNS_11HostAndPortERKNS_3rpc15ReplSetMetadataEN5boost8optionalINS5_18OplogQueryMetadataEEE+0xB6) [0x5583044b96e6]
 mongod(_ZN5mongo4repl31DataReplicatorExternalStateImpl18shouldStopFetchingERKNS_11HostAndPortERKNS_3rpc15ReplSetMetadataEN5boost8optionalINS5_18OplogQueryMetadataEEE+0x59) [0x5583043de079]
 mongod(+0xEE2F52) [0x5583043c5f52]
 mongod(_ZN5mongo4repl12OplogFetcher9_callbackERKNS_10StatusWithINS_7Fetcher13QueryResponseEEEPNS_14BSONObjBuilderE+0x20A4) [0x558304457c54]
 mongod(_ZN5mongo7Fetcher9_callbackERKNS_8executor12TaskExecutor25RemoteCommandCallbackArgsEPKc+0x621) [0x558303ddbae1]
 mongod(+0x12DCF6A) [0x5583047bff6a]
 mongod(_ZN5mongo8executor22ThreadPoolTaskExecutor11runCallbackESt10shared_ptrINS1_13CallbackStateEE+0x1B3) [0x5583047c3373]
 mongod(+0x12E084B) [0x5583047c384b]
 mongod(_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE+0x135) [0x5583049aae35]
 mongod(_ZN5mongo10ThreadPool13_consumeTasksEv+0xC0) [0x5583049ab960]
 mongod(_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x149) [0x5583049ac509]
 mongod(+0x1FA62D0) [0x5583054892d0]
 libpthread.so.0(+0x76BA) [0x7f6bed68e6ba]
 libc.so.6(clone+0x6D) [0x7f6bed3c441d]
-----  END BACKTRACE  -----

This happened when I did an `rs.remove()` action for this node. Our data dog graphs show a memory spike and then it recovered, mostly because the process got killed.

Let me know if you need more information than this.

Comment by Kelsey Schubert [ 18/Aug/17 ]

Hi ankon,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Thomas

Comment by Kelsey Schubert [ 21/Jul/17 ]

Hi ankon,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the information I requested in the previous comment?

Thank you,
Thomas

Comment by Kelsey Schubert [ 11/Jul/17 ]

Hi ankon,

Thank for reporting this issue; we're looking into it. To help us investigate this issue, would you please provide the following information?

  • Output of the command, rs.conf()
  • Complete log files for each node in the replica set
  • Archives of the diagnostic.data directories for each node in the replica set

I've created a secure upload portal where you can upload these files. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted.

Thanks again for your help,
Thomas

Comment by Andreas Kohn [ 11/Jul/17 ]

2017-07-11T16:10:56.754Z I -        [conn7] Invariant failure i < _members.size() src/mongo/db/repl/repl_set_config.cpp 620
2017-07-11T16:10:56.754Z I -        [conn7] 
 
***aborting after invariant() failure
 
 
2017-07-11T16:10:56.774Z F -        [conn7] Got signal: 6 (Aborted).
 
 0x55e560b6a921 0x55e560b69b39 0x55e560b6a01d 0x7f23ec4f5370 0x7f23ec1591d7 0x7f23ec15a8c8 0x55e55fe1340c 0x55e5605a447c 0x55e5605a45d9 0x55e56066176f 0x55e5605dbdab 0x55e5605c20ac 0x55e56006541f 0x55e560066b01 0x55e56067edf0 0x55e560284c78 0x55e55fe82c5d 0x55e55fe8358d 0x55e560ad2881 0x7f23ec4eddc5 0x7f23ec21b6ed
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55E55F5F9000","o":"1571921","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55E55F5F9000","o":"1570B39"},{"b":"55E55F5F9000","o":"157101D"},{"b":"7F23EC4E6000","o":"F370"},{"b":"7F23EC124000","o":"351D7","s":"gsignal"},{"b":"7F23EC124000","o":"368C8","s":"abort"},{"b":"55E55F5F9000","o":"81A40C","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"55E55F5F9000","o":"FAB47C"},{"b":"55E55F5F9000","o":"FAB5D9"},{"b":"55E55F5F9000","o":"106876F","s":"_ZN5mongo4repl23TopologyCoordinatorImpl26processReplSetRequestVotesERKNS0_23ReplSetRequestVotesArgsEPNS0_27ReplSetRequestVotesResponseERKNS0_6OpTimeE"},{"b":"55E55F5F9000","o":"FE2DAB","s":"_ZN5mongo4repl26ReplicationCoordinatorImpl26processReplSetRequestVotesEPNS_16OperationContextERKNS0_23ReplSetRequestVotesArgsEPNS0_27ReplSetRequestVotesResponseE"},{"b":"55E55F5F9000","o":"FC90AC","s":"_ZN5mongo4repl22CmdReplSetRequestVotes3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS9_RNS_14BSONObjBuilderE"},{"b":"55E55F5F9000","o":"A6C41F","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"55E55F5F9000","o":"A6DB01","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"55E55F5F9000","o":"1085DF0","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"55E55F5F9000","o":"C8BC78","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"55E55F5F9000","o":"889C5D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"55E55F5F9000","o":"88A58D"},{"b":"55E55F5F9000","o":"14D9881"},{"b":"7F23EC4E6000","o":"7DC5"},{"b":"7F23EC124000","o":"F76ED","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.6", "gitVersion" : "c55eb86ef46ee7aede3b1e2a5d184a7df4bfb5b5", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.7.2", "version" : "#1 SMP Fri Apr 7 10:38:40 PDT 2017", "machine" : "x86_64" }, "somap" : [ { "b" : "55E55F5F9000", "elfType" : 3, "buildId" : "E7DAD75E0AB1FE58FC79C75D5C934D568B747E3E" }, { "b" : "7FFE584DF000", "elfType" : 3, "buildId" : "E1B7CD9BB3B12443C1D330322068CD7DE34C2D53" }, { "b" : "7F23ED40E000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "FAF738A64DDEFBE202A700A23E88667132CFB755" }, { "b" : "7F23ED026000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "BA85B1BBD34D0502B5648916F409484083D688E7" }, { "b" : "7F23ECE1E000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "84240319CB3644443A8E5D6E265ABE8E023B81E6" }, { "b" : "7F23ECC1A000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "EB575314825D0BB0D64D2251E8B779E52FA8D419" }, { "b" : "7F23EC918000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "417E9F2F89F47EBA68BBFC48EC70020874F5D071" }, { "b" : "7F23EC702000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "3FD5F89DE59E124AB1419A0BD16775B4096E84FD" }, { "b" : "7F23EC4E6000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "9620D82D9ED384A215A25AC280672B8F5B8C8553" }, { "b" : "7F23EC124000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "071017DCC4755642CFC82A38A7EB74B535476A45" }, { "b" : "7F23ED67C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "B2DC1178D4666475D45C1D9E8A3861C250DD1323" }, { "b" : "7F23EBED6000", "path" : "/usr/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "1BE9E6309ED365E35806E13FA9E23350D71F2513" }, { "b" : "7F23EBBEF000", "path" : "/usr/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "9EE23694485D684651195C7B51766E47D0CB95E3" }, { "b" : "7F23EB9EC000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5C01209C5AE1B1714F19B07EB58F2A1274B69DC8" }, { "b" : "7F23EB7BA000", "path" : "/usr/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "FD5974E4861D56DFFFFC8BF5DB35E74B1C20ABD5" }, { "b" : "7F23EB5A4000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F23EB395000", "path" : "/usr/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1B55330B231D45AF433F7D9DCA507C5FB0609780" }, { "b" : "7F23EB192000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F23EAF78000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "13F0C4788711C7B6BC537231CA981ECD345CAE5E" }, { "b" : "7F23EAD57000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55e560b6a921]
 mongod(+0x1570B39) [0x55e560b69b39]
 mongod(+0x157101D) [0x55e560b6a01d]
 libpthread.so.0(+0xF370) [0x7f23ec4f5370]
 libc.so.6(gsignal+0x37) [0x7f23ec1591d7]
 libc.so.6(abort+0x148) [0x7f23ec15a8c8]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x55e55fe1340c]
 mongod(+0xFAB47C) [0x55e5605a447c]
 mongod(+0xFAB5D9) [0x55e5605a45d9]
 mongod(_ZN5mongo4repl23TopologyCoordinatorImpl26processReplSetRequestVotesERKNS0_23ReplSetRequestVotesArgsEPNS0_27ReplSetRequestVotesResponseERKNS0_6OpTimeE+0x16F) [0x55e56066176f]
 mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl26processReplSetRequestVotesEPNS_16OperationContextERKNS0_23ReplSetRequestVotesArgsEPNS0_27ReplSetRequestVotesResponseE+0x16B) [0x55e5605dbdab]
 mongod(_ZN5mongo4repl22CmdReplSetRequestVotes3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS9_RNS_14BSONObjBuilderE+0x2DC) [0x55e5605c20ac]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x4FF) [0x55e56006541f]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF81) [0x55e560066b01]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0x55e56067edf0]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xD38) [0x55e560284c78]
 mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0x55e55fe82c5d]
 mongod(+0x88A58D) [0x55e55fe8358d]
 mongod(+0x14D9881) [0x55e560ad2881]
 libpthread.so.0(+0x7DC5) [0x7f23ec4eddc5]
 libc.so.6(clone+0x6D) [0x7f23ec21b6ed]
-----  END BACKTRACE  -----

Generated at Thu Feb 08 04:22:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.