Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92529

Migration recovery could be run asynchronously as part of the shard version recovery

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • 2

      Once a node steps up, it will try to recover the shardVersion as part of the resume migration hook. 

      Until the resume migration is over, the shardVersion will be marked as UNKNOWN which won't allow any read or write operation to be served.

      As part of resume, the migration will be completed. The completion will depend on whether the collection was either committed or aborted:

      In case is aborted the donor will

      • Exit the critical section on the recipient
      • Schedule a range deletion for possible orphans on the recipient
      • Delete the range deletion task locally

      In case is committed the donor will:

      • Exit the critical section on the recipient
      • Schedule a range deletion task locally for possible orphans on the donor
      • Delete the range deletion task on the recipient

      Ideally, the entire completion could be done asynchronously which would re-enable read and writes faster on the donor.

      Note this ticket is just a suggestion as part of the conclusion taken on BF-34016 investigation, where the recovery on the donor caused a transaction on the recipient to block. The required time and cost of implementation should be evaluated carefully.

      In general, we should also evaluate whether the benefit of such implementation would outweigh the costs.

            Assignee:
            Unassigned Unassigned
            Reporter:
            enrico.golfieri@mongodb.com Enrico Golfieri
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: