[SERVER-70746] _configsvrReshardCollection Will Not Join Existing Operations After Shard Key is Updated Created: 20/Oct/22  Updated: 29/Oct/23  Resolved: 09/Nov/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Brett Nawrocki Assignee: Abdul Qadeer
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
is caused by SERVER-62720 _configsvrReshardCollection can fail ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

(function() {
'use strict';
 
load("jstests/libs/fail_point_util.js");
load("jstests/sharding/libs/resharding_test_fixture.js");
load("jstests/libs/discover_topology.js");
 
const reshardingTest = new ReshardingTest();
reshardingTest.setup();
 
const donorName = reshardingTest.donorShardNames[0];
const recipientName = reshardingTest.recipientShardNames[0];
const sourceCollection = reshardingTest.createShardedCollection({
    ns: 'reshardingDb.coll',
    shardKeyPattern: {oldKey: 1},
    chunks: [
        {min: {oldKey: MinKey}, max: {oldKey: MaxKey}, shard: donorName},
    ]
});
const mongos = sourceCollection.getMongo();
const topology = DiscoverTopology.findConnectedNodes(mongos);
const config = new Mongo(topology.configsvr.primary);
 
const hangBeforeRemovingStateDoc =
    configureFailPoint(config, "reshardingPauseCoordinatorBeforeRemovingStateDoc");
 
reshardingTest.withReshardingInBackground(
    {
        newShardKeyPattern: {newKey: 1},
        newChunks: [{min: {newKey: MinKey}, max: {newKey: MaxKey}, shard: recipientName}],
    },
    (tempNs) => {},
    {
        postDecisionPersistedFn: () => {
            hangBeforeRemovingStateDoc.wait();
            assert.commandWorked(config.adminCommand({replSetStepDown: 10, force: true}));
        }
    });
 
reshardingTest.teardown();
})();

Sprint: Sharding NYC 2022-10-31, Sharding NYC 2022-11-14
Participants:
Linked BF Score: 20

 Description   

_configsvrReshardCollection returns early and does not check for ongoing resharding operations if the new shard key is the same as the old shard key. The resharding coordinator service updates the shard key to the new shard key as part of persisting the commit decision.

This means that a _configsvrReshardCollection will return as if resharding has already completed as soon as the shard key for the collection is updated, even though the resharding operation is still ongoing. This resulted in BF-26453, where a config server stepdown after persisting the commit decision caused the ReshardCollectionCoordinator to retry the _configsvrReshardCollection command after step up and return immediately, before the state document could be deleted. This caused the ReshardingTestFixture to check for the state document too soon, and fail an assertion.

This issue was introduced by 08a4e5e as part of SERVER-62720, which coincides with the first time this issue was seen. Prior to this commit, the check for an existing resharding instance was performed before the check for a different shard key.



 Comments   
Comment by Githook User [ 09/Nov/22 ]

Author:

{'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}

Message: SERVER-70746 Check for superfluous resharding operation before initializing
Branch: master
https://github.com/mongodb/mongo/commit/c70333568fad90fd508eb89730bd605bf557458a

Generated at Thu Feb 08 06:17:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.