[SERVER-34672] Unable to add shard on 3.7.5 sharded cluster with mmapv1 shard Created: 25/Apr/18  Updated: 26/Apr/18  Resolved: 26/Apr/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Louisa Berger Assignee: Esha Maharishi (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File cfg1.json     File cfg2.json     File cfg3.json     Text File csrs_primary.log     Text File csrs_secondary.log     File mongos.json     Text File mongos.log     File shard1.json     File shard2.json     File shard3.json     Text File shard_primary.log     Text File shard_secondary.log    
Issue Links:
Depends
Duplicate
duplicates SERVER-34483 Avoid taking DBLocks when clearing in... Closed
Related
is related to SERVER-27534 All writing operations must fail if t... Closed
is related to SERVER-34459 Clear in-memory database versions on ... Closed
Operating System: ALL
Steps To Reproduce:
  1. Start up a 3-node shard on mmapv1
  2. Initiate repl set:

    > rs.initiate({_id: "a", "members" : [ {"_id" : 0, host : "louisamac:9001"}, {_id: 1, host: "louisamac:9002"}, {_id: 2, host: "louisamac:9003"} ] })
    {
    	"ok" : 1,
    	"operationTime" : Timestamp(1524673861, 1),
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1524673861, 1),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	}
    }
    

  3. Start up a 3-node csrs
  4. Initiate csrs

    > rs.initiate({configsvr: true, protocolVersion:1, _id: "csrs", "members" : [ {"_id" : 0, host : "louisamac:9007"}, {_id: 1, host: "louisamac:9008"}, {_id: 2, host: "louisamac:9009"} ] })
    {
    	"ok" : 1,
    	"operationTime" : Timestamp(1524674021, 1),
    	"$gleStats" : {
    		"lastOpTime" : Timestamp(1524674021, 1),
    		"electionId" : ObjectId("000000000000000000000000")
    	},
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1524674021, 1),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	},
    	"lastCommittedOpTime" : Timestamp(0, 0)
    }
    

  5. Start up mongos
  6. Call {{db.runCommand( {"setFeatureCompatibilityVersion" : "4.0"}

    )}} on mongos

  7. Attempt to add a shard

    mongos> db.runCommand({"addShard" : "a/louisamac:9001,louisamac:9002,louisamac:9003", name: "shard0"})
    {
    	"ok" : 0,
    	"errmsg" : "failed to run command { setFeatureCompatibilityVersion: \"4.0\" } when attempting to add shard a/louisamac:9001,louisamac:9002,louisamac:9003 :: caused by :: NetworkInterfaceExceededTimeLimit: timed out",
    	"code" : 96,
    	"codeName" : "OperationFailed",
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1524674234, 1),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	},
    	"operationTime" : Timestamp(1524674234, 1)
    }
    

Participants:
Linked BF Score: 0

 Description   

If you bring up a sharded cluster on 3.7.5 with a mmapv1 multi-node shard, you get a timeout error from trying to add the shard on the mongos:

mongos> db.runCommand({"addShard" : "a/louisamac:9001,louisamac:9002,louisamac:9003", name: "shard0"})
{
	"ok" : 0,
	"errmsg" : "failed to run command { setFeatureCompatibilityVersion: \"4.0\" } when attempting to add shard a/louisamac:9001,louisamac:9002,louisamac:9003 :: caused by :: NetworkInterfaceExceededTimeLimit: timed out",
	"code" : 96,
	"codeName" : "OperationFailed",
	"$clusterTime" : {
		"clusterTime" : Timestamp(1524674234, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1524674234, 1)
}

Note: I tried this on WT and it worked fine. Assigning to storage for that reason, but may need to be reassigned.
Second note: Also did not have this problem with a 1-node shard.

Attached config files and log files.



 Comments   
Comment by Kyle Suarez [ 26/Apr/18 ]

Looks like Esha fixed this as part of SERVER-34483, so I'm closing this as a duplicate.

Comment by Kyle Suarez [ 25/Apr/18 ]

Eric, I think we are updating an existing FCV document (so no implicit collection creation).

Here, at time 20:37:48, we create the admin.system.version collection. I assume this is sometime during startup. The fCV is also immediately set to "3.6" right afterwards:

[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:48.244+0000 I STORAGE  [conn5] createCollection: admin.system.version with provided UUID: 17b9faaa-6621-4905-a4c6-368557621b27
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:48.245+0000 I COMMAND  [conn5] setting featureCompatibilityVersion to 3.6

8 (milliseconds?) later, conn22 joins the party at 20:37:56 and then runs setFeatureCompatibilityVersion. Based on this timeline, I think any previous setFeatureCompatibilityVersion command has completed and the collection exists with "3.6" inside the fCV document.

[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:56.950+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:35759 #22 (6 connections now open)
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:56.951+0000 I NETWORK  [conn22] received client metadata from 127.0.0.1:35759 conn22: { driver: { name: "NetworkInterfaceTL", version: "3.7.7-15-gc50a57061a-patch-5ae0e2082fbabe40d969328b" }, os: { type: "Linux", name: "Red Hat Enterprise Linux Server release 6.2 (Santiago)", architecture: "x86_64", version: "Kernel 2.6.32-220.el6.x86_64" } }
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:56.951+0000 I COMMAND  [conn22] CMD: drop config.system.sessions
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:56.972+0000 I SHARDING [conn22] initializing sharding state with: { _id: "shardIdentity", configsvrConnectionString: "config-rs/localhost:20000,localhost:20001,localhost:20002", shardName: "shard-rs0", clusterId: ObjectId('5ae0e71ad8699a7709896661') }
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:56.975+0000 I NETWORK  [conn22] Starting new replica set monitor for config-rs/localhost:20000,localhost:20001,localhost:20002
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:56.975+0000 I SHARDING [conn22] initialized sharding components for primary node.
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:57.007+0000 I COMMAND  [conn22] setting featureCompatibilityVersion to upgrading to 4.0
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:37:57.007+0000 I NETWORK  [conn22] Skip closing connection for connection # 22

conn22 then hits the std::terminate() when a stepdown occurs concurrently:

[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:38:37.035+0000 F -        [conn22] terminate() called. An exception is active; attempting to gather more information
[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:38:37.070+0000 F -        [conn22] DBException::toString(): InterruptedDueToReplStateChange: operation was interrupted

Comment by Eric Milkie [ 25/Apr/18 ]

Kyle, are the failing tests updating an FCV document, or are they inserting a new one (and implicitly creating its containing collection as well)?

Comment by Geert Bosch [ 25/Apr/18 ]

Most likely somebody registered an onCommit handler that can throw an exception.

Comment by Kyle Suarez [ 25/Apr/18 ]

My theory that this was caused by SERVER-27534 was wrong, as reverting the commit still causes a failure in my patch build that looks like this:

[ShardedClusterFixture:job0:shard0:node0] 2018-04-25T20:38:37.070+0000 F -        [conn22] DBException::toString(): InterruptedDueToReplStateChange: operation was interrupted
[ShardedClusterFixture:job0:shard0:node0] Actual exception type: mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)11602, mongo::ExceptionForCat<(mongo::ErrorCategory)1>, mongo::ExceptionForCat<(mongo::ErrorCategory)2> >
[ShardedClusterFixture:job0:shard0:node0]  0x7fefb01864a1 0x7fefb0185e85 0x7fefb027ad06 0x7fefb027ad51 0x7fefaee28f27 0x7fefaee28f4a 0x7fefb00d000a 0x7fefaf029c7e 0x7fefaf02b31d 0x7fefaf0067fb 0x7fefaf056c1a 0x7fefaf05732b 0x7fefaf05745d 0x7fefaed284b9 0x7fefaed20f4d 0x7fefaed1eafb 0x7fefae87a35f 0x7fefae87ca1b 0x7fefae87e571 0x7fefae87f45e 0x7fefae86d6ca 0x7fefafa35bdd 0x7fefafa36264 0x7fefafa4c53c 0x7fefafa497a4 0x7fefafa498d8 0x7fefaee5dc8a 0x7fefaee5e6cd 0x7fefaeca370f 0x7fefafbdfbe9 0x7fefae87a35f 0x7fefae87ca1b 0x7fefae87e571 0x7fefae87f45e 0x7fefae86d6ca 0x7fefae87826a 0x7fefae873017 0x7fefae876731 0x7fefafbbe462 0x7fefae8712ff 0x7fefae8743c5 0x7fefae8727df 0x7fefae87309d 0x7fefae876731 0x7fefafbbe9c5 0x7fefb00cf244 0x7fefab0dfaa1 0x7fefaae2cbcd
[ShardedClusterFixture:job0:shard0:node0] ----- BEGIN BACKTRACE -----
[ShardedClusterFixture:job0:shard0:node0] {"backtrace":[{"b":"7FEFADD51000","o":"24354A1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"7FEFADD51000","o":"2434E85"},{"b":"7FEFADD51000","o":"2529D06","s":"_ZN10__cxxabiv111__terminateEPFvvE"},{"b":"7FEFADD51000","o":"2529D51"},{"b":"7FEFADD51000","o":"10D7F27"},{"b":"7FEFADD51000","o":"10D7F4A","s":"_ZN5mongo15DurRecoveryUnit16commitUnitOfWorkEv"},{"b":"7FEFADD51000","o":"237F00A","s":"_ZN5mongo15WriteUnitOfWork6commitEv"},{"b":"7FEFADD51000","o":"12D8C7E","s":"_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE"},{"b":"7FEFADD51000","o":"12DA31D","s":"_ZN5mongo11UpdateStage6doWorkEPm"},{"b":"7FEFADD51000","o":"12B57FB","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"7FEFADD51000","o":"1305C1A","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"7FEFADD51000","o":"130632B","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"7FEFADD51000","o":"130645D","s":"_ZN5mongo12PlanExecutor11executePlanEv"},{"b":"7FEFADD51000","o":"FD74B9","s":"_ZN5mongo14performUpdatesEPNS_16OperationContextERKNS_9write_ops6UpdateE"},{"b":"7FEFADD51000","o":"FCFF4D"},{"b":"7FEFADD51000","o":"FCDAFB"},{"b":"7FEFADD51000","o":"B2935F"},{"b":"7FEFADD51000","o":"B2BA1B"},{"b":"7FEFADD51000","o":"B2D571"},{"b":"7FEFADD51000","o":"B2E45E","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE"},{"b":"7FEFADD51000","o":"B1C6CA","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE"},{"b":"7FEFADD51000","o":"1CE4BDD"},{"b":"7FEFADD51000","o":"1CE5264","s":"_ZN5mongo14DBDirectClient4callERNS_7MessageES2_bPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE"},{"b":"7FEFADD51000","o":"1CFB53C","s":"_ZN5mongo12DBClientBase20runCommandWithTargetENS_12OpMsgRequestE"},{"b":"7FEFADD51000","o":"1CF87A4","s":"_ZN5mongo12DBClientBase20runCommandWithTargetERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_7BSONObjERS9_i"},{"b":"7FEFADD51000","o":"1CF88D8","s":"_ZN5mongo12DBClientBase10runCommandERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_7BSONObjERS9_i"},{"b":"7FEFADD51000","o":"110CC8A","s":"_ZN5mongo27FeatureCompatibilityVersion17_runUpdateCommandEPNS_16OperationContextESt8functionIFvNS_14BSONObjBuilderEEE"},{"b":"7FEFADD51000","o":"110D6CD","s":"_ZN5mongo27FeatureCompatibilityVersion29unsetTargetUpgradeOrDowngradeEPNS_16OperationContextENS_10StringDataE"},{"b":"7FEFADD51000","o":"F5270F"},{"b":"7FEFADD51000","o":"1E8EBE9","s":"_ZN5mongo12BasicCommand10Invocation3runEPNS_16OperationContextEPNS_19CommandReplyBuilderE"},{"b":"7FEFADD51000","o":"B2935F"},{"b":"7FEFADD51000","o":"B2BA1B"},{"b":"7FEFADD51000","o":"B2D571"},{"b":"7FEFADD51000","o":"B2E45E","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE"},{"b":"7FEFADD51000","o":"B1C6CA","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE"},{"b":"7FEFADD51000","o":"B2726A","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE"},{"b":"7FEFADD51000","o":"B22017","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"7FEFADD51000","o":"B25731"},{"b":"7FEFADD51000","o":"1E6D462","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE"},{"b":"7FEFADD51000","o":"B202FF","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE"},{"b":"7FEFADD51000","o":"B233C5","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE"},{"b":"7FEFADD51000","o":"B217DF","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE"},{"b":"7FEFADD51000","o":"B2209D","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"7FEFADD51000","o":"B25731"},{"b":"7FEFADD51000","o":"1E6D9C5"},{"b":"7FEFADD51000","o":"237E244"},{"b":"7FEFAB0D8000","o":"7AA1"},{"b":"7FEFAAD44000","o":"E8BCD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.7.7-15-gc50a57061a-patch-5ae0e2082fbabe40d969328b", "gitVersion" : "c50a57061af31f92c1c6aab09b1417ab127fed0c", "compiledModules" : [ "enterprise" ], "uname" : { "sysname" : "Linux", "release" : "2.6.32-220.el6.x86_64", "version" : "#1 SMP Wed Nov 9 08:03:13 EST 2011", "machine" : "x86_64" }, "somap" : [ { "b" : "7FEFADD51000", "elfType" : 3, "buildId" : "C0DD5889A6E32FC242E5748D438A852CC0844F7C" }, { "b" : "7FFF8F1C1000", "elfType" : 3, "buildId" : "08F634A1D22DEFF00461D50A7699DACDC97657BF" }, { "b" : "7FEFAD8E2000", "path" : "/usr/lib64/libnetsnmpagent.so.20", "elfType" : 3, "buildId" : "1270BB069D761BD79C79F8986BB3ED5DCAA7D06D" }, { "b" : "7FEFAD6BC000", "path" : "/usr/lib64/libnetsnmphelpers.so.20", "elfType" : 3, "buildId" : "3FA4F246A7DF00EC1355C5226C9308DC7B4AB5CD" }, { "b" : "7FEFAD1F4000", "path" : "/usr/lib64/libnetsnmpmibs.so.20", "elfType" : 3, "buildId" : "AE65092368DDB948A32B62D613DD8FFE210EBEB9" }, { "b" : "7FEFACF19000", "path" : "/usr/lib64/libnetsnmp.so.20", "elfType" : 3, "buildId" : "52E4D411A95E6C7FCCE0E1942B525AC8FBBDF4A8" }, { "b" : "7FEFACCC8000", "path" : "/lib64/libldap-2.4.so.2", "elfType" : 3, "buildId" : "DDBAC283102A61D6A63B3F3952A1C06657FF3AE8" }, { "b" : "7FEFACAB9000", "path" : "/lib64/liblber-2.4.so.2", "elfType" : 3, "buildId" : "244D2593BDE4FE657BC88572DB5DA88FA274B7F3" }, { "b" : "7FEFAC89F000", "path" : "/usr/lib64/libsasl2.so.2", "elfType" : 3, "buildId" : "E0AEE889D5BF1373F2F9EE0D448DBF3F5B5113F0" }, { "b" : "7FEFAC65B000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "0C249DF4D77989253CCD859956BF50749308A16A" }, { "b" : "7FEFAC406000", "path" : "/usr/lib64/libcurl.so.4", "elfType" : 3, "buildId" : "A38B9CE8AEAF277CBD8BC1298B1731E2C9A66192" }, { "b" : "7FEFAC1EC000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "F0BE1166EDCFFB2422B940D601A1BBD89352D80F" }, { "b" : "7FEFABE07000", "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "1EDB45C205A844A75EBBB4F0075E705803FFB85B" }, { "b" : "7FEFABB9B000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "D256E285C5E11D9A99EB04CA7651003A8F67B64E" }, { "b" : "7FEFAB997000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "1F7E85410384392BC51FA7324961719A10125F31" }, { "b" : "7FEFAB78F000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "FDF3A36FFFE08375456D59DA959EAB2FC30B6186" }, { "b" : "7FEFAB50B000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "8A852AC42F0B64F0F30C760EBBCFA3FE4A228F12" }, { "b" : "7FEFAB2F5000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C3C1EFABDE9070C96E1785051F892B78926BC3E9" }, { "b" : "7FEFAB0D8000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "85104ECFE42C606B31C2D0D0D2E5DACD3286A341" }, { "b" : "7FEFAAD44000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8A7E7404A2335231BE759CB54F8041344CAC0C1B" }, { "b" : "7FEFADB2E000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "1CC2165E019D43F71FDE0A47AF9F4C8EB5E51963" }, { "b" : "7FEFAAB39000", "path" : "/lib64/libwrap.so.0", "elfType" : 3, "buildId" : "083332F88CF3C61AB0184D8F397FC8BFF4548D8E" }, { "b" : "7FEFAA7CE000", "path" : "/usr/lib64/perl5/CORE/libperl.so", "elfType" : 3, "buildId" : "53842C2896DED0063E1BE5C650CE97C67AE97973" }, { "b" : "7FEFAA5B5000", "path" : "/lib64/libnsl.so.1", "elfType" : 3, "buildId" : "D233CCCC987214EE5DACCF88949E31469228F6FF" }, { "b" : "7FEFAA37E000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3, "buildId" : "F542C8ACD4AD1F2C6A551043BDFBAB051905DA1C" }, { "b" : "7FEFAA17B000", "path" : "/lib64/libutil.so.1", "elfType" : 3, "buildId" : "2963FF1BBF4BF9131097982EB8BE5C905A342CBD" }, { "b" : "7FEFA9F10000", "path" : "/usr/lib64/librpm.so.1", "elfType" : 3, "buildId" : "C65174824A80EDE5374CFF6143C808807160CA63" }, { "b" : "7FEFA9CE1000", "path" : "/usr/lib64/librpmio.so.1", "elfType" : 3, "buildId" : "F858A331FA080C7E82549BE3191EB4BADE02A5C0" }, { "b" : "7FEFA9AD8000", "path" : "/lib64/libpopt.so.0", "elfType" : 3, "buildId" : "E7B49911F1136073DD7DC58E8118CD9A4FBE2A19" }, { "b" : "7FEFA98C2000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "D053BB4FF0C2FC983842F81598813B9B931AD0D1" }, { "b" : "7FEFA96B2000", "path" : "/usr/lib64/libsensors.so.4", "elfType" : 3, "buildId" : "6855E5BF5B3634C15F01B1043BD892D727EE3C08" }, { "b" : "7FEFA9465000", "path" : "/usr/lib64/libssl3.so", "elfType" : 3, "buildId" : "C5EB2766ABF9ACE9E4556548DC04A37131788870" }, { "b" : "7FEFA9239000", "path" : "/usr/lib64/libsmime3.so", "elfType" : 3, "buildId" : "6842A55418527250648A1836541354C79613F8BD" }, { "b" : "7FEFA8EF6000", "path" : "/usr/lib64/libnss3.so", "elfType" : 3, "buildId" : "9221B9CD4B38C4C3FE22B82AA65E2405860E79CA" }, { "b" : "7FEFA8CC9000", "path" : "/usr/lib64/libnssutil3.so", "elfType" : 3, "buildId" : "F1484D8815EFE9CC47C437AE0AA7A89A3B5A3A24" }, { "b" : "7FEFA8AC5000", "path" : "/lib64/libplds4.so", "elfType" : 3, "buildId" : "21B62D06504B5AC5A7A849E7C8B919DF357EBEFE" }, { "b" : "7FEFA88C0000", "path" : "/lib64/libplc4.so", "elfType" : 3, "buildId" : "83EB817989559AE1CBAE20564AAAB42D61532D9E" }, { "b" : "7FEFA8682000", "path" : "/lib64/libnspr4.so", "elfType" : 3, "buildId" : "993E6315CCFCEA516F5A0F993632DFE1A4A395A4" }, { "b" : "7FEFA839B000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "624C7056B8BBE6BA758DEF557F516FBDBD01E1FD" }, { "b" : "7FEFA816F000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "C81673692EEF670BC951EE726490F5D1CAB822F4" }, { "b" : "7FEFA7F6B000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "088FB9EC41563FE043C14CA969FB38468B647B2E" }, { "b" : "7FEFA7D60000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "03B69EEB8998AC9CA7519A27571BAD976BA4C56D" }, { "b" : "7FEFA7B5D000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "3BCCABE75DC61BBA81AAE45D164E26EF4F9F55DB" }, { "b" : "7FEFA792B000", "path" : "/lib64/libidn.so.11", "elfType" : 3, "buildId" : "5659EB985475B586E3BBCB95BA21F4A30BE5EBF4" }, { "b" : "7FEFA7703000", "path" : "/usr/lib64/libssh2.so.1", "elfType" : 3, "buildId" : "8727EC925D6D91DAC74A99BDE8B3C6EE96AF13EA" }, { "b" : "7FEFA7500000", "path" : "/lib64/libfreebl3.so", "elfType" : 3, "buildId" : "AFF1C795A3CF422C9F8AC32C7522F6376B1EA087" }, { "b" : "7FEFA72EF000", "path" : "/lib64/libbz2.so.1", "elfType" : 3, "buildId" : "1250B1D041DD7552F0C870BB188DC3A34DF2651D" }, { "b" : "7FEFA70D9000", "path" : "/usr/lib64/libelf.so.1", "elfType" : 3, "buildId" : "50517407A07B8D6C9A55A392E99246B52E8BFEEA" }, { "b" : "7FEFA6EB8000", "path" : "/usr/lib64/liblzma.so.0", "elfType" : 3, "buildId" : "6FF9BAEEEE9DDEEF2DFA5CBD36147A75891C0AD4" }, { "b" : "7FEFA6C8B000", "path" : "/usr/lib64/liblua-5.1.so", "elfType" : 3, "buildId" : "6BDB4E1990D6EBA12A5C8D39A7650DB8798BF568" }, { "b" : "7FEFA6A6C000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "B4576BE308DDCF7BC31F7304E4734C3D846D0236" }, { "b" : "7FEFA6868000", "path" : "/lib64/libcap.so.2", "elfType" : 3, "buildId" : "A436538388F1F25113FDA834CA2EED524EFA17D6" }, { "b" : "7FEFA6660000", "path" : "/lib64/libacl.so.1", "elfType" : 3, "buildId" : "26CC708AC7C0FC1797A2340C024F0ADD0CE054D8" }, { "b" : "7FEFA62EB000", "path" : "/lib64/libdb-4.7.so", "elfType" : 3, "buildId" : "D91C702275E2039E98E39925B02FF5C53A6C3312" }, { "b" : "7FEFA60E6000", "path" : "/lib64/libattr.so.1", "elfType" : 3, "buildId" : "8EF0683858704EF173AB11B1E27076F37F82B7B6" } ] }}
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7fefb01864a1]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0x2434E85) [0x7fefb0185e85]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x7fefb027ad06]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0x2529D51) [0x7fefb027ad51]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0x10D7F27) [0x7fefaee28f27]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo15DurRecoveryUnit16commitUnitOfWorkEv+0x1A) [0x7fefaee28f4a]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo15WriteUnitOfWork6commitEv+0x4A) [0x7fefb00d000a]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE+0x71E) [0x7fefaf029c7e]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo11UpdateStage6doWorkEPm+0x52D) [0x7fefaf02b31d]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo9PlanStage4workEPm+0x6B) [0x7fefaf0067fb]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x47A) [0x7fefaf056c1a]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x4B) [0x7fefaf05732b]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo12PlanExecutor11executePlanEv+0x6D) [0x7fefaf05745d]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo14performUpdatesEPNS_16OperationContextERKNS_9write_ops6UpdateE+0x989) [0x7fefaed284b9]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xFCFF4D) [0x7fefaed20f4d]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xFCDAFB) [0x7fefaed1eafb]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xB2935F) [0x7fefae87a35f]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xB2BA1B) [0x7fefae87ca1b]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xB2D571) [0x7fefae87e571]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE+0x31E) [0x7fefae87f45e]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x3A) [0x7fefae86d6ca]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0x1CE4BDD) [0x7fefafa35bdd]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo14DBDirectClient4callERNS_7MessageES2_bPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x34) [0x7fefafa36264]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo12DBClientBase20runCommandWithTargetENS_12OpMsgRequestE+0x1EC) [0x7fefafa4c53c]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo12DBClientBase20runCommandWithTargetERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_7BSONObjERS9_i+0x94) [0x7fefafa497a4]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo12DBClientBase10runCommandERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_7BSONObjERS9_i+0x58) [0x7fefafa498d8]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo27FeatureCompatibilityVersion17_runUpdateCommandEPNS_16OperationContextESt8functionIFvNS_14BSONObjBuilderEEE+0x14DA) [0x7fefaee5dc8a]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo27FeatureCompatibilityVersion29unsetTargetUpgradeOrDowngradeEPNS_16OperationContextENS_10StringDataE+0x5D) [0x7fefaee5e6cd]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xF5270F) [0x7fefaeca370f]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo12BasicCommand10Invocation3runEPNS_16OperationContextEPNS_19CommandReplyBuilderE+0xD9) [0x7fefafbdfbe9]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xB2935F) [0x7fefae87a35f]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xB2BA1B) [0x7fefae87ca1b]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xB2D571) [0x7fefae87e571]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE+0x31E) [0x7fefae87f45e]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x3A) [0x7fefae86d6ca]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE+0xBA) [0x7fefae87826a]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x97) [0x7fefae873017]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xB25731) [0x7fefae876731]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE+0x1A2) [0x7fefafbbe462]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE+0x15F) [0x7fefae8712ff]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE+0xAF5) [0x7fefae8743c5]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE+0x30F) [0x7fefae8727df]
[ShardedClusterFixture:job0:shard0:node0]  mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x11D) [0x7fefae87309d]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0xB25731) [0x7fefae876731]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0x1E6D9C5) [0x7fefafbbe9c5]
[ShardedClusterFixture:job0:shard0:node0]  mongod(+0x237E244) [0x7fefb00cf244]
[ShardedClusterFixture:job0:shard0:node0]  libpthread.so.0(+0x7AA1) [0x7fefab0dfaa1]
[ShardedClusterFixture:job0:shard0:node0]  libc.so.6(clone+0x6D) [0x7fefaae2cbcd]
[ShardedClusterFixture:job0:shard0:node0] -----  END BACKTRACE  -----

Back to square one. milkie, I'm going to give this to the Storage Team, as although the failure crosses into Query code it seems like there is something in the storage layer related to MMAPv1 that is causing these issues when running setFeatureCompatibilityVersion.

Comment by Kyle Suarez [ 25/Apr/18 ]

Aha, okay, I'm seeing a lot more of this in Evergreen. The common threads are:

  • it's MMAPv1
  • we are attempting to update the featureCompatibilityDocument
  • when attempting to commit the update, we are interrupted by an exception, which causes us to call std::terminate

I think the solution will be to prevent MMAPv1 servers from terminating when a commit is interrupted by an exception (usually InterruptedDueToReplStateChange). That seems to be fallout from SERVER-27534. Either the command should fail gracefully without crashing the server, or we must do something at RecoveryUnit commit time to allow the commit to succeed despite the fact that our replica set state is changing.

Given that the update is to the fCV document (that is, not to actual user data), it seems fine to allow the commit to succeed.

Generated at Thu Feb 08 04:37:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.