ConfigSvrMoveRange doesn't pick a new chunk after StaleConfig

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.3.0-rc0
    • Affects Version/s: 8.3.0-rc0
    • Component/s: None
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2025-09-29
    • 200
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      Cause

      SERVER-103706 has moved the handling of the StaleConfig error in the strategy.cpp in favor of using only router role loops where needed. 
      This had the consequence of having the ShardsvrMoveRange, called by the ConfigSvrMoveRange to keep retrying against the same shard using the same range in case of StaleConfig.
      Note that ShardsvrMoveRange issues StaleConfig by manually checking the range is fully owned by the shard, and not by checking the version as for every other command. Simply updating the version here is not enough, we need to update the entire ConfigSvrMoveRange request.

      Example
      Imagine 1 chunk in shard1 with no chunks on shard2
      shard1: [min,max]

      shard2: []

      • we issue a moveRange (moveRange1) for moving the entire chunk on shard2
      • a parallel moveRange (moveRange2) happens such that 

      shard1: [min,half]

      shard2: [half,max]

      the loop above for moveRange1 would keep trying using [min,max] against shard1, as the ConfigSvrMoveRange request doesn't change and the shard is chosen based on the min

      Before The changes
      Before the StaleConfig would've been propagated to the mongos, which would ve stop retrying and simply reported to the user that the requested range no longer exists

      We should probably move that check within the ConfigSvrMoveRange.

            Assignee:
            Silvia Surroca
            Reporter:
            Enrico Golfieri
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: