[SERVER-41758] Dropping config.shards is allowed and can cause mongos to crash in aggregation code Created: 14/Jun/19  Updated: 29/Oct/23  Resolved: 24/Aug/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.2.1, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Bernard Gorman
Resolution: Fixed Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Query 2019-08-12, Query 2019-08-26, Query 2019-09-09
Participants:
Linked BF Score: 0

 Description   

Perhaps the aggregation code should handle 0 shards?

Or should sharding prevent dropping config.shards? Sharding has discussed before not wanting to disallow direct writes to the config database, since sometimes people do that, and I think mongorestore might drop some of the config collections.

We could also prevent the fuzzer from dropping config collections to prevent the build failures.



 Comments   
Comment by Githook User [ 13/Sep/19 ]

Author:

{'name': 'Bernard Gorman', 'username': 'gormanb', 'email': 'bernard.gorman@mongodb.com'}

Message: SERVER-41758 Verify that at least 1 shard exists after hard-reload in aggregation routing path

(cherry picked from commit 1fa4766c621bd4cfd74319094469eff3a5de3b79)
Branch: v4.2
https://github.com/mongodb/mongo/commit/89d5c8b10e40648a403f12a55bcb66f2f5bef384

Comment by Githook User [ 24/Aug/19 ]

Author:

{'name': 'Bernard Gorman', 'email': 'bernard.gorman@mongodb.com', 'username': 'gormanb'}

Message: SERVER-41758 Verify that at least 1 shard exists after hard-reload in aggregation routing path
Branch: master
https://github.com/mongodb/mongo/commit/1fa4766c621bd4cfd74319094469eff3a5de3b79

Comment by Bernard Gorman [ 11/Jul/19 ]

esha.maharishi, kaloian.manassiev: This might be a bit trickier than it initially appears, because in the aggregation code we do handle (or at least, attempt to handle) the case where there are no shards present. We actually have at least two tests that exercise the no-shards case: current_op_no_shards.js and change_stream_no_shards.js. And dropping config.shards before running an aggregation - even on an existing sharded collection - does not crash the mongoS, but instead returns an empty cursor as expected:

mongos> sh.enableSharding("test")
{
	"ok" : 1,
	"operationTime" : Timestamp(1562861800, 5),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1562861800, 5),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
mongos> sh.shardCollection("test.testing", {_id: "hashed"})
{
	"collectionsharded" : "test.testing",
	"collectionUUID" : UUID("0ab08d64-c195-4fa8-996a-12da3b9436e3"),
	"ok" : 1,
	"operationTime" : Timestamp(1562861806, 27),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1562861806, 27),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
mongos> for(let i = 0; i < 10; ++i) { db.testing.insert({_id: i}) }
WriteResult({ "nInserted" : 1 })
mongos> sh.status()
--- Sharding Status ---
  sharding version: {
  	"_id" : 1,
  	"minCompatibleVersion" : 5,
  	"currentVersion" : 6,
  	"clusterId" : ObjectId("5d2760dff6801f7f213c11c0")
  }
  shards:
        {  "_id" : "shard01",  "host" : "shard01/localhost:27018",  "state" : 1 }
        {  "_id" : "shard02",  "host" : "shard02/localhost:27019",  "state" : 1 }
  active mongoses:
        "4.1.13" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
        {  "_id" : "test",  "primary" : "shard01",  "partitioned" : true,  "version" : {  "uuid" : UUID("25106fc6-b1ec-4495-a5d8-119b904724c8"),  "lastMod" : 1 } }
                test.testing
                        shard key: { "_id" : "hashed" }
                        unique: false
                        balancing: true
                        chunks:
                                shard01	2
                                shard02	2
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard01 Timestamp(1, 0)
                        { "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong(0) } on : shard01 Timestamp(1, 1)
                        { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard02 Timestamp(1, 2)
                        { "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard02 Timestamp(1, 3)
 
mongos> use config
switched to db config
mongos> db.shards.drop()
true
mongos> use test
switched to db test
mongos> db.testing.aggregate([{$match: {_id: 5}}])
  ----> [EMPTY CURSOR, NO RESULTS]
mongos> sh.status()
--- Sharding Status ---
  sharding version: {
  	"_id" : 1,
  	"minCompatibleVersion" : 5,
  	"currentVersion" : 6,
  	"clusterId" : ObjectId("5d2760dff6801f7f213c11c0")
  }
  shards:
  active mongoses:
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
        {  "_id" : "test",  "primary" : "shard01",  "partitioned" : true,  "version" : {  "uuid" : UUID("25106fc6-b1ec-4495-a5d8-119b904724c8"),  "lastMod" : 1 } }
                test.testing
                        shard key: { "_id" : "hashed" }
                        unique: false
                        balancing: true
                        chunks:
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard01 Timestamp(1, 0)
                        { "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong(0) } on : shard01 Timestamp(1, 1)
                        { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard02 Timestamp(1, 2)
                        { "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard02 Timestamp(1, 3)
 
mongos>

So it looks like maybe BF-13560 manifests because the collection is dropped somewhere between our initial check and the point where we try to establish the cursors and hit this invariant?

Comment by Kaloian Manassiev [ 05/Jul/19 ]

Regardless of whether we allow config.shards to be dropped or not (this also applies to the other collections), I don't think crash in aggregation is appropriate, so at the very least the aggregation code should be fixed to handle 0 shards. Passing this to the query team.

Generated at Thu Feb 08 04:58:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.