Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-70746

_configsvrReshardCollection Will Not Join Existing Operations After Shard Key is Updated

    • Fully Compatible
    • ALL
    • Hide
      (function() {
      'use strict';
      
      load("jstests/libs/fail_point_util.js");
      load("jstests/sharding/libs/resharding_test_fixture.js");
      load("jstests/libs/discover_topology.js");
      
      const reshardingTest = new ReshardingTest();
      reshardingTest.setup();
      
      const donorName = reshardingTest.donorShardNames[0];
      const recipientName = reshardingTest.recipientShardNames[0];
      const sourceCollection = reshardingTest.createShardedCollection({
          ns: 'reshardingDb.coll',
          shardKeyPattern: {oldKey: 1},
          chunks: [
              {min: {oldKey: MinKey}, max: {oldKey: MaxKey}, shard: donorName},
          ]
      });
      const mongos = sourceCollection.getMongo();
      const topology = DiscoverTopology.findConnectedNodes(mongos);
      const config = new Mongo(topology.configsvr.primary);
      
      const hangBeforeRemovingStateDoc =
          configureFailPoint(config, "reshardingPauseCoordinatorBeforeRemovingStateDoc");
      
      reshardingTest.withReshardingInBackground(
          {
              newShardKeyPattern: {newKey: 1},
              newChunks: [{min: {newKey: MinKey}, max: {newKey: MaxKey}, shard: recipientName}],
          },
          (tempNs) => {},
          {
              postDecisionPersistedFn: () => {
                  hangBeforeRemovingStateDoc.wait();
                  assert.commandWorked(config.adminCommand({replSetStepDown: 10, force: true}));
              }
          });
      
      reshardingTest.teardown();
      })();
      
      Show
      ( function () { 'use strict' ; load( "jstests/libs/fail_point_util.js" ); load( "jstests/sharding/libs/resharding_test_fixture.js" ); load( "jstests/libs/discover_topology.js" ); const reshardingTest = new ReshardingTest(); reshardingTest.setup(); const donorName = reshardingTest.donorShardNames[0]; const recipientName = reshardingTest.recipientShardNames[0]; const sourceCollection = reshardingTest.createShardedCollection({ ns: 'reshardingDb.coll' , shardKeyPattern: {oldKey: 1}, chunks: [ {min: {oldKey: MinKey}, max: {oldKey: MaxKey}, shard: donorName}, ] }); const mongos = sourceCollection.getMongo(); const topology = DiscoverTopology.findConnectedNodes(mongos); const config = new Mongo(topology.configsvr.primary); const hangBeforeRemovingStateDoc = configureFailPoint(config, "reshardingPauseCoordinatorBeforeRemovingStateDoc" ); reshardingTest.withReshardingInBackground( { newShardKeyPattern: {newKey: 1}, newChunks: [{min: {newKey: MinKey}, max: {newKey: MaxKey}, shard: recipientName}], }, (tempNs) => {}, { postDecisionPersistedFn: () => { hangBeforeRemovingStateDoc.wait(); assert.commandWorked(config.adminCommand({replSetStepDown: 10, force: true })); } }); reshardingTest.teardown(); })();
    • Sharding NYC 2022-10-31, Sharding NYC 2022-11-14
    • 20

      _configsvrReshardCollection returns early and does not check for ongoing resharding operations if the new shard key is the same as the old shard key. The resharding coordinator service updates the shard key to the new shard key as part of persisting the commit decision.

      This means that a _configsvrReshardCollection will return as if resharding has already completed as soon as the shard key for the collection is updated, even though the resharding operation is still ongoing. This resulted in BF-26453, where a config server stepdown after persisting the commit decision caused the ReshardCollectionCoordinator to retry the _configsvrReshardCollection command after step up and return immediately, before the state document could be deleted. This caused the ReshardingTestFixture to check for the state document too soon, and fail an assertion.

      This issue was introduced by 08a4e5e as part of SERVER-62720, which coincides with the first time this issue was seen. Prior to this commit, the check for an existing resharding instance was performed before the check for a different shard key.

            Assignee:
            abdul.qadeer@mongodb.com Abdul Qadeer
            Reporter:
            brett.nawrocki@mongodb.com Brett Nawrocki
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: