Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-75315

Shard split donor tries to wait on a opTime for all recipient nodes to reach after the recipient nodes have installed split config.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • 7.1.0-rc0
    • None
    • None
    • Fully Compatible
    • ALL
    • Server Serverless 2023-04-17, Server Serverless 2023-05-01, Server Serverless 2023-05-15, Server Serverless 2023-05-29, Server Serverless 2023-06-12, Server Serverless 2023-06-26
    • 5

    Description

      In the BF, after a donor failover , we see the new donor primary tries to wait on a opTime for all recipient nodes to reach after the recipient nodes have installed split config, . And, that results in split timeout and causing the split to abort with "ErrorCodes: ExceededTimeLimit" which the test suite (shard_split_stepdown_jscore_passthrough) isn't expecting.

      Either we should fix the problem by adding some markers in the donor state document after all recipient nodes are caught up to blockTS  (i.e, something  here)  and can be used to decide whether to skip  the "waiting for BlockTS" stage or not (or) make the suites which involves step down in combo with shard split (shard_split_kill_primary_jscore_passthrough, shard_split_terminate_primary_jscore_passthrough, shard_split_stepdown_jscore_passthrough) to ignore such "ErrorCodes: ExceededTimeLimit" errors.

      But to be noted, in the shard split scope, we have this goal has completed

      Be resilient to failover including elections, node restarts, and transient network errors.

      In case if we are going with the latter solution, we should inform the Cloud that shard split is not completely resilient to failovers + make a note in the scope document + update the arch guide if needed.

      Attachments

        Activity

          People

            matt.broadstone@mongodb.com Matt Broadstone
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: