Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62181

JStests including multiple parallel migrations with failpoints shouldn't be run in the config server stepdown suites

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.3.0, 5.2.0-rc4
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • v5.2
    • Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10
    • 156

      A JS test defining a sequence of commands like

       var joinMoveChunk1 = moveChunkParallel(
          staticMongod, st.s0.host, {Key: 10}, null, 'TestDB.TestColl', st.shard2.shardName);
      var joinMoveChunk2 = moveChunkParallel(
          staticMongod, st.s0.host, {Key: 30}, null, 'TestDB.TestColl', st.shard3.shardName);
      
      waitForMigrateStep(st.shard2, migrateStepNames.rangeDeletionTaskScheduled);
      waitForMigrateStep(st.shard3, migrateStepNames.rangeDeletionTaskScheduled);
      
      unpauseMigrateAtStep(st.shard2, migrateStepNames.rangeDeletionTaskScheduled);
      unpauseMigrateAtStep(st.shard3, migrateStepNames.rangeDeletionTaskScheduled);
      
      joinMoveChunk1();
      joinMoveChunk2();
      

      May reach a deadlock state when a step-down event occurs after having issued the shard command for moveChunk1, but before sending out the one for moveChunk2, provoked by the fact that:

      1. on step-up, moveChunk1 will be regenerated as part of the recovery procedure of the balancer, which will only be completed once such commands also completes
      2. moveChunk2 will also set for dispatching during the step-up, but the command won't be actually sent as long as the recovery is over.
      3. nevertheless, the recovery will never end, since moveChunk1 is blocked by the presence of an active failpoint (which will only be disabled once moveChunk2 is over)

      The behaviour described in 1) and 2) matches the designed functionality of the BalancerCommandsScheduler (and the legacy MigrationManager). The proposal is then to solve the problem by avoiding the execution of the affected tests in stepdown suites.

            Assignee:
            paolo.polato@mongodb.com Paolo Polato
            Reporter:
            paolo.polato@mongodb.com Paolo Polato
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: