Investigate using gossipping executor during addShard once precondition checks complete

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.3.0-rc0
    • Component/s: None
    • Catalog and Routing
    • ALL
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      During addShard, we use an executor which does not gossip the vector clock to communicate with the replica set to be added. See SERVER-105324 for more details on this but TLDR is that doing otherwise can cause the config server to have too high of a config time after a failed addShard.

       

      This ticket is to investigate whether, now that we have a coordinator, it might be safe to use the executor with gossiping once we are in a phase which must always make progress. In theory, here we know that the shard will be added and so there is no major difference between starting gossiping here and after the commit. The benefit would be that we would have stronger causality guarantees with the writes on the shard (writing the shard identity and entering the critical section).

       

      We already have two places which mistakenly do this - the block of FCV changes on the new shard and the enter of the critical section so this is almost certainly safe in the normal case. However, we should check if it is possible to hit any edge case like we saw in SERVER-105234 and then unify the behaviors so that we don't have a mix of gossiping and non-gossiping commands.

            Assignee:
            Unassigned
            Reporter:
            Allison Easton
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: