Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39498

ShardRegistry reload inside onReplicationRollback can get stuck

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.1.7
    • Fix Version/s: 4.1.11
    • Component/s: Sharding
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Sharding 2019-04-22, Sharding 2019-05-06, Sharding 2019-05-20
    • Linked BF Score:
      23

      Description

      Repro scenario:
      1. Rollback occurred.
      2. Periodic shard registry reload tries to perform shard reload. It is done with majority readConcern and the latest configOpTime. However, since a rollback just occurred, configOpTime > lastAppliedOpTime, so the reload will block.
      3. Rollback finishes fixing the oplog and record store. Now calls the OpObserverImpl::onReplicationRollback.
      4. Rollback thread tries to call ShardRegistry reload, but since the periodic reload thread is in the middle of reload, it just tries to wait for it to finish. And this causes cyclic dependency since the opTime won't advance until the rollback thread finishes.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: