Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-82627

ReshardingDataReplication does not join the ReshardingOplogFetcher thread pool causing invariant failure.

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.2.0-rc0, 7.0.6, 5.0.25, 6.0.14
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • v7.2, v7.1, v7.0, v6.0, v5.0
    • 120

      As seen in https://jira.mongodb.org/browse/BF-30264 – it is possible that while resharding is in progress, a recipient primary may step down and the step up process does not wait for the step down to complete. When resharding completes on the recipient, the recipient state document is deleted on the current primary and this deletion is then replicated on the secondaries. Since an earlier secondary was a primary, it has a stale ActiveInstance (because the step up did not wait for the step down to complete), its deletion of the state document triggers the instance's cleanup and that is when the invariant failure is hit because the task in the GuaranteedExecutor failed to run before deletion. To avoid such scenarios, ReshardingDataReplication must join the ReshardingOplogFetcher thread pool.

            nandini.bhartiya@mongodb.com Nandini Bhartiya
            nandini.bhartiya@mongodb.com Nandini Bhartiya
            0 Vote for this issue
            3 Start watching this issue