Exclusion of pending range deletions in transition to dedicated progress check can cause commit to fail

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.3.0-rc0
    • Affects Version/s: 8.3.0-rc0
    • Component/s: None
    • None
    • Cluster Scalability
    • Fully Compatible
    • ALL
    • Hide

      Reproducable attached

      Show
      Reproducable attached
    • ClusterScalability 2Feb-16Feb
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In SERVER-103990, the wait for range deletions during transition to dedicated was replaced with a wait for orphanCleanupDelaySecs. In this wait, we only look for non-pending range deletion tasks. This is find during the remove shard commit because we drain migrations prior to checking this.

      However, we would ideally like to wait for this prior to starting the commit (ie. don't return drained in the status getter if we know we will have to wait in the commit since this will cause the commit to fail) and there is a race condition in which this is not the current behavior because we commit a move chunk prior to marking the range deletion task as non-pending.

      We should remove the restriction on non-pending tasks as it will allow us to report more accurately the draining status and should not affect the correctness of the check. Given that the migration must succeed (otherwise the removeShard cannot complete) we don't have to worry about waiting unnecessarily for an aborted migration.

            Assignee:
            Abdul Qadeer
            Reporter:
            Allison Easton
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: