[SERVER-54231] Resharding can leave behind local collection on former primary shard that doesn't own any chunks Created: 03/Feb/21  Updated: 29/Oct/23  Resolved: 07/Oct/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.2
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc1

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: PM-234-M3, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-60007 Implement command to drop collection ... Closed
Gantt Dependency
has to be done after SERVER-54230 Stop returning sorted shard IDs vecto... Closed
Related
related to SERVER-54279 Primary shard may end up with inconsi... Closed
related to SERVER-73686 Make ShardsvrDropCollectionIfUUIDNotM... Closed
related to SERVER-57759 Run movePrimary command before shardi... Closed
is related to SERVER-40859 Orphaned collections after a movePrim... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Steps To Reproduce:

Shuffle the allShardIds vector here in selectShardForNewDatabase() before to cycle on it.
Simply remove the const specifier of the vector and add:

std::random_shuffle(allShardIds.begin(), allShardIds.end());

Sprint: Sharding EMEA 2021-09-20, Sharding EMEA 2021-10-04, Sharding EMEA 2021-10-18
Participants:
Story Points: 2

 Description   
Original summary

resharding_allowMigrations.js fails if primary shard of 'reshardingDb' is not the first shard

Original description

I've discovered this while working on SERVER-54230, the problem is that if we randomly choose the primary shard of newly created database the resharding_allowMigrations.js starts failing.

More specifically I've discovered that this happen only if the primary shard is chosen among one of the donor shards that is not shard-0.

 This is an example of a failing run on evg.



 Comments   
Comment by Githook User [ 12/Oct/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-54231 Resharding must not leave stale collection catalog entries
Branch: v5.0
https://github.com/mongodb/mongo/commit/e252d1b49304379ec1e04fe8ec3f10619fdfa476

Comment by Githook User [ 11/Oct/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-54231 Resharding must not leave stale collection catalog entries
Branch: v5.1
https://github.com/mongodb/mongo/commit/40cb50a010db2715fd9745ede02ce0cc6f8bbc34

Comment by Githook User [ 07/Oct/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-54231 Resharding must not leave stale collection catalog entries
Branch: master
https://github.com/mongodb/mongo/commit/955e48addd93a25ad16c1b6df55eab66ec917a5f

Comment by Pierlauro Sciarelli [ 28/Sep/21 ]

If we wanted to make a more narrowly scoped change to resharding, then I think we would want to add logic to the ReshardingCoordinator to broadcast a collection drop to all shards (not just those which own chunks for the sharded collection under the old key pattern) by the source UUID.

This can now be done by calling the _shardsvrDropCollectionIfUUIDNotMatching implemented under SERVER-60007.

I believe changing the movePrimary and moveChunk behavior would the preferred longer-term solution here.

If we ever want to cleanup orphaned collection catalog entries, in addition to those changes we may also have to call the new command for cleaning up pre-existing garbage. I believe a reasonable plan would be:

  • Change movePrimary and moveChunk behaviors to cleanup local collection catalog, starting from LTS XYZ.0 .
  • Add an FCV step broadcasting _shardsvrDropCollectionIfUUIDNotMatching for every sharded collection upon upgrading to version XYZ.0 .
  • Change shardCollection to broadcast local drops to all shards before starting: to cleanup pre-existing garbage for a collection that was dropped but not recreated before upgrading to XYZ.0 (so that would be missed by the FCV broadcasts) .
Comment by Max Hirschhorn [ 09/Sep/21 ]

What does chunk migration do on the donor shard when it is the last chunk for the sharded collection? Does the collection catalog entry get dropped on the donor shard?

To answer my question from before, the collection catalog entry doesn't get dropped part of donating the shard's last chunk for the collection. I believe changing the movePrimary and moveChunk behavior would the preferred longer-term solution here.

If we wanted to make a more narrowly scoped change to resharding, then I think we would want to add logic to the ReshardingCoordinator to broadcast a collection drop to all shards (not just those which own chunks for the sharded collection under the old key pattern) by the source UUID. This could be done wholly after the participants have dropped/renamed the collection being resharded to avoid needing to think about what happens if the DonorStateMachine or RecipientStateMachine attempts to drop/rename the collection when the drop command comes in from the config server primary.

Comment by Max Hirschhorn [ 04/Feb/21 ]

Max Hirschhorn, the DDL project will make all DDL serialise with each other, so there should never be an attempt to run drop concurrently with a rename. Are you referring to the case where a command, which was "stuck" in a router somewhere comes much later?

kaloian.manassiev, no, the sequence of operations doesn't involve any concurrent DDL operations. The issue described in my earlier comment is how a former primary shard retains the collection catalog entry even when it no longer owns chunks for the sharded collection.

SERVER-54279 will solve part of the issue for the current primary shard by having it be considered a recipient shard even when it doesn't own any chunks for the sharded collection. But additional work is needed in this ticket to solve the issue for former primary shards. Tommaso had said the drop collection command in the DDL project was cleaning up this "garbage state" by broadcasting the drop collection command to all shards and not just all shards which own chunks for the sharded collection.

Tommaso had pointed out that the movePrimary command won't drop any sharded collections on the former primary shard even when it no longer owns chunks for them. What does chunk migration do on the donor shard when it is the last chunk for the sharded collection? Does the collection catalog entry get dropped on the donor shard?

Comment by Kaloian Manassiev [ 04/Feb/21 ]

max.hirschhorn, the DDL project will make all DDL serialise with each other, so there should never be an attempt to run drop concurrently with a rename. Are you referring to the case where a command, which was "stuck" in a router somewhere comes much later?

Comment by Max Hirschhorn [ 03/Feb/21 ]

The DDL project is addressing this by having the drop collection command broadcasted to all shards rather than only shards which own a chunk for the collection. One thought would be to have the coordinator broadcast such a drop command.

This drop collection command would need to use the collection UUID rather than its namespace string to avoid an ordering dependency with the collection rename on recipient shards to install the temporary resharding collection as the new sharded collection.

Comment by Max Hirschhorn [ 03/Feb/21 ]

I discussed this issue with Tommaso over Zoom. The resharding_allowMigrations.js test is failing because the donor1 shard still has the sharded collection on it despite the resharding operation having succeeded.

  1. donor1 is primary shard for the "reshardingDb" database.
  2. ReshardingTest#createShardedCollection() is called to create a new sharded collection.
    • donor1 creates the collection because it is the primary shard. It won't actually own any chunks for the collection. donor1 therefore won't ever create a DonorStateMachine for the resharding operation.
    • donor0 creates the collection because it owns chunks for the now-sharded collection.
  3. recipient0 is made the primary shard from ReshardingTest#createShardedCollection().
  4. reshardCollection is run.
  5. donor0 drops the existing sharded collection.
  6. donor1 never drops the existing sharded collection it had leftover. The metadata for the sharded collection on donor1 is also inconsistent because the resharding operation changed the collection's UUID.

[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.277+0000 Error: [null] != [{
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.277+0000 	"name" : "coll",
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.277+0000 	"type" : "collection",
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.277+0000 	"options" : {
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.277+0000
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.277+0000 	},
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.277+0000 	"info" : {
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.277+0000 		"readOnly" : false,
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 		"uuid" : UUID("b19994b2-cb87-4ef6-aa57-9ea7e72c30a8")
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 	},
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 	"idIndex" : {
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 		"v" : 2,
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 		"key" : {
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 			"_id" : 1
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 		},
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 		"name" : "_id_"
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 	}
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 }] are not equal : collection exists on shard1-donor1 despite resharding having succeeded :
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 doassert@src/mongo/shell/assert.js:20:14
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 assert.eq@src/mongo/shell/assert.js:179:9
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 _checkDonorPostState@jstests/sharding/libs/resharding_test_fixture.js:468:13
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 _checkPostState@jstests/sharding/libs/resharding_test_fixture.js:383:13
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 _checkConsistencyAndPostState@jstests/sharding/libs/resharding_test_fixture.js:349:13
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 withReshardingInBackground@jstests/sharding/libs/resharding_test_fixture.js:250:9
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 @jstests/sharding/resharding_allowMigrations.js:26:1
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 @jstests/sharding/resharding_allowMigrations.js:10:2
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.278+0000 failed to load: jstests/sharding/resharding_allowMigrations.js
[js_test:resharding_allowMigrations] 2021-02-02T20:42:14.279+0000 exiting with code -3

https://logkeeper.mongodb.org/lobster/build/2254d45078bea89d33e18628c973158f/test/6019b9059041300dfb9c626b#bookmarks=0%2C4366%2C12578

Comment by Max Hirschhorn [ 03/Feb/21 ]

resharding_allowMigrations.js fails if primary shard of 'reshardingDB' is not first shard

tommaso.tocci, I'm not sure what to make of this ticket which mentions resharding and is linked to SERVER-54230. Is this test going to fail as a result of you pushing the changes in SERVER-54230?

I'd like to also mention that the ReshardingTest fixture runs the movePrimary command so the primary shard within the test is always the same.

Generated at Thu Feb 08 05:32:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.