featureFlagReshardingSkipCloningAndApplyingIfApplicable makes resharding critcal section get acquired on a non-donor db primary shard before critical section is engaged by coordinator

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • ALL
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      SERVER-91109 introduced a way to make moveCollection have less performance impact by making the primary shard not do the fetching and applying if it's not the recipient shard and instead transition to the "strict-consistency" state right away. However, the state transition to the primary shard involves taking the critical section (see SERVER-53653) to prepare for collection rename. This can cause problems in the following scenario:

      1. There is a collection with shard0 as the primary shard, which the user has been reading from and writing to via mongos0.
      2. The user runs a moveCollection command to move the collection from shard0 to shard1. 
      3. Before running any reads or writes against the collection, the user runs a moveCollection command to move the collection from shard1 to shard2. The donor for this resharding operation is shard1. The recipients are shard2 and also shard0 (because by design primary shard is always included as a recipient)
      4. shard0, as a recipient shard, does not have any incoming chunks so it skips cloning and fetching and transitions to "strict-consistency". Since it's also not a donor, it acquires the critical section as mentioned earlier. 
      5. Meanwhile, shard2, as another recipient shard, is still cloning the collection.
      6. The user performs a write against the collection. mongos0 has stale routing info so it routes the write command to shard0.
        1. The write gets blocked because critical section has been acquired on shard0. As a result, the command is blocked.
        2. The command would remain blocked until shard2 finishes cloning and applying, and then the coordinator commits, which would trigger a release of the critical section on shard0. After that, the read/write command on shard0 would return StaleConfig. mongos0 would then refresh and then route the query to shard2. 

      The issue here is that it might take a long time for shard2 to finish cloning and applying. If the user only read and write to this collection via mongoses that have stale routing info when shard0 used to own this collection, this would result in downtime for the collection. 

      NB: For a write via a mongos without cached routing info at all, the write would get routed to the new owning shard (shard1) which would still accept writes until the coordinator engages the critical section to commit. Similarly, for a write via a mongos with cached routing info from before when the primary shard (shard0) used to own the collection, the write would get routed to that old owning shard and hit StaleConfig, and then mongos would refresh the routing info and route the write to new owning shard (shard1), which again would still accept writes. 

      In summary, this issue requires that:

      • The moveCollection operation doesn't have the primary shard as the donor or the direct recipient.
      • The users uses mongos with stale routing info from when the primary shard used to own the collection being moved. 

       

              Assignee:
              Unassigned
              Reporter:
              Cheahuychou Mao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: