[SERVER-37554] Cannot drop config.system.sessions when calling addShard on 4.0 inMemory cluster Created: 10/Oct/18  Updated: 27/Oct/23  Resolved: 12/Oct/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Louisa Berger Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File cs1.conf     File cs2.conf     File cs3.conf     File mongos.conf     Text File mongos.log     Text File primary.log     File rs1.conf     File rs2.conf     File rs3.conf    
Issue Links:
Backports
Depends
Assigned Teams:
Sharding
Operating System: ALL
Backport Requested:
v4.0
Steps To Reproduce:
  1. Clear out all db directories
  2. Start up 3-node replica set with inMemory storage engine
  3. Initiate the replica set
  4. Start up 3-node csrs
  5. Initiate the csrs
  6. Start up the mongos
  7. Call addShard
Participants:

 Description   

I've attached the conf files I used for my sharded cluster. 

I was running this on 4.0.3-ent. 

I tried running the csrs with and without inMemory (wasn't sure if it's allowed for csrs) and saw the same behavior both ways. 

When I try to add the rs as a shard, I get the following:

MongoDB Enterprise mongos> db.runCommand({"addShard": "rs1/louisamac:5000,louisamac:5001,louisamac:5002", "name": "shard0"})MongoDB Enterprise mongos> db.runCommand({"addShard": "rs1/louisamac:5000,louisamac:5001,louisamac:5002", "name": "shard0"}){ "ok" : 0, "errmsg" : "can't add shard with a local copy of config.system.sessions, please drop this collection from the shard manually and try again. :: caused by :: failed to run command { drop: \"system.sessions\", writeConcern: { w: \"majority\" } } when attempting to add shard rs1/louisamac:5000,louisamac:5001,louisamac:5002 :: caused by :: NetworkInterfaceExceededTimeLimit: timed out", "code" : 96, "codeName" : "OperationFailed", "operationTime" : Timestamp(1539203544, 1), "$clusterTime" : { "clusterTime" : Timestamp(1539203544, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }} 

 

When I tried the same repro with wiredTiger, I did not see this issue. 

Note: I found this while testing, not in a production or customer issue.

cc kaloian.manassiev esha.maharishi



 Comments   
Comment by Andy Schwerin [ 11/Oct/18 ]

If the in-memory member has votes:1, then you should set writeConcernMajorityDefault:false, generally speaking.

Comment by Louisa Berger [ 11/Oct/18 ]

Setting to false fixes the issue!

So we can add a validation on our end that inMemory shards must always have writeConcernMajorityDefault:false. Is that correct schwerin?

Also, is this true for any shard with at least one inMemory member?

Comment by Andy Schwerin [ 11/Oct/18 ]

My fault. I should have said to set that parameter to false. The default is true.

Comment by Louisa Berger [ 11/Oct/18 ]

Just tried with writeConcernMajorityJournalDefault: true and got the same issue: 

MongoDB Enterprise > rs.initiate({"_id": "rs1", "members": [ {"_id": 0, host: "louisamac:5000"}, {"_id": 1, host: "louisamac:5001"}, {"_id": 2, host: "louisamac:5002"}], writeConcernMajorityJournalDefault:true}) 

MongoDB Enterprise mongos> db.runCommand({"addShard": "rs1/louisamac:5000,louisamac:5001,louisamac:5002", "name": "shard0"})
{
	"ok" : 0,
	"errmsg" : "can't add shard with a local copy of config.system.sessions, please drop this collection from the shard manually and try again. :: caused by :: failed to run command { drop: \"system.sessions\", writeConcern: { w: \"majority\" } } when attempting to add shard rs1/louisamac:5000,louisamac:5001,louisamac:5002 :: caused by :: NetworkInterfaceExceededTimeLimit: timed out",
	"code" : 96,
	"codeName" : "OperationFailed",
	"operationTime" : Timestamp(1539265893, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1539265893, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

Comment by Andy Schwerin [ 11/Oct/18 ]

What's the replica set config look like for the replica set? Replica sets that include voting memory-only nodes need to set writeConcernMajorityJournalDefault: true in their replica set configuration, or they won't be able to commit majority writes. Perhaps that is causing you to experience this problem, louisa.berger.

Comment by Louisa Berger [ 10/Oct/18 ]

blake.oler yes, same in both releases. My attached logs are for 4.0.3

Comment by Blake Oler [ 10/Oct/18 ]

louisa.berger steps to reproduce mention 4.0.2, but the description mentions 4.0.3. Does the issue exist in both releases? And how about the latest master build?

Generated at Thu Feb 08 04:46:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.