Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-75315

Shard split donor tries to wait on a opTime for all recipient nodes to reach after the recipient nodes have installed split config.

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • Server Serverless 2023-04-17, Server Serverless 2023-05-01, Server Serverless 2023-05-15, Server Serverless 2023-05-29, Server Serverless 2023-06-12, Server Serverless 2023-06-26
    • 5

      In the BF, after a donor failover , we see the new donor primary tries to wait on a opTime for all recipient nodes to reach after the recipient nodes have installed split config, . And, that results in split timeout and causing the split to abort with "ErrorCodes: ExceededTimeLimit" which the test suite (shard_split_stepdown_jscore_passthrough) isn't expecting.

      Either we should fix the problem by adding some markers in the donor state document after all recipient nodes are caught up to blockTS  (i.e, something  here)  and can be used to decide whether to skip  the "waiting for BlockTS" stage or not (or) make the suites which involves step down in combo with shard split (shard_split_kill_primary_jscore_passthrough, shard_split_terminate_primary_jscore_passthrough, shard_split_stepdown_jscore_passthrough) to ignore such "ErrorCodes: ExceededTimeLimit" errors.

      But to be noted, in the shard split scope, we have this goal has completed

      Be resilient to failover including elections, node restarts, and transient network errors.

      In case if we are going with the latter solution, we should inform the Cloud that shard split is not completely resilient to failovers + make a note in the scope document + update the arch guide if needed.

            Assignee:
            matt.broadstone@mongodb.com Matt Broadstone
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: