Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32593

CSRS stepdown during migration commit can trigger fassert on source shard primary

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.3, 3.7.2
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.6
    • Sprint:
      Sharding 2018-01-15, Sharding 2018-01-29, Sharding 2018-02-12
    • Linked BF Score:
      0

      Description

      During the critical section, the source shard sends _configsvrCommitChunkMigration to the CSRS primary, but this can fail if the primary recently stepped down, causing the source shard to try to log "moveChunk.validating" on the CSRS primary to update its optime before refreshing metadata, and if this also fails, the source shard will fassert.

      From this comment, it seems that this is desired behavior, but it's a problem for the continuous stepdowns concurrency suite with the balancer enabled, since background migrations can crash servers and fail the test when the cluster is torn down.

      Example failure: https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_concurrency_sharded_with_stepdowns_and_balancer_WT_patch_b8f64cc3fde6d041f3e90b1cb2e153b0b15f6c47_5a20e4efe3c33173de00c68d_17_12_01_05_13_55

        Attachments

          Activity

            People

            Assignee:
            jack.mulrow Jack Mulrow
            Reporter:
            jack.mulrow Jack Mulrow
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: