Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-74842

`collStats` must never return a negative number of orphans

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Sharding EMEA
    • Fully Compatible
    • ALL
    • Sharding EMEA 2023-03-20
    • 135

      The BalancerStatsRegistry is a ReplicaSetAwareService so it is only up on primaries; when asking to a secondary the number of orphans, they're retrieved by executing a local query bypassing the checks for non-negativity that are in place when the service is up on primary nodes.

      Purpose of this ticket is to return 0 orphaned documents also from secondaries when the tracked number is negative.

      Why can the counter be negative?

      When a migration is aborted, the counter of orphans tracked in the range deletion task document is missing the number of documents cloned in the last batch because the update (to perform here or here) is skipped due to the error triggering the abort.

      Since the range deletion is processed as follows, it may transiently happen to track on disk a negative number of orphaned docs:

      while (there are docs in the orphaned range)
      -- Delete a batch
      -- Decrement the counter on disk (num orphans tracked on disk - num orphaned documents deleted)

      This is a self-recovering error that can happen on at most ONE range deletion document at a time. The error is automatically corrected once the range deleter finishes processing potential ranges with wrong counters. And it's quite a rare condition (hopefully there should never be so many aborts).

            silvia.surroca@mongodb.com Silvia Surroca
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            0 Vote for this issue
            4 Start watching this issue