Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Server Serverless 2023-04-17, Server Serverless 2023-05-01, Server Serverless 2023-05-15, Server Serverless 2023-05-29, Server Serverless 2023-06-12, Server Serverless 2023-06-26
Linked BF Score:
5
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

In the BF, after a donor failover , we see the new donor primary tries to wait on a opTime for all recipient nodes to reach after the recipient nodes have installed split config, . And, that results in split timeout and causing the split to abort with "ErrorCodes: ExceededTimeLimit" which the test suite (shard_split_stepdown_jscore_passthrough) isn't expecting.

Either we should fix the problem by adding some markers in the donor state document after all recipient nodes are caught up to blockTS (i.e, something here) and can be used to decide whether to skip the "waiting for BlockTS" stage or not (or) make the suites which involves step down in combo with shard split (shard_split_kill_primary_jscore_passthrough, shard_split_terminate_primary_jscore_passthrough, shard_split_stepdown_jscore_passthrough) to ignore such "ErrorCodes: ExceededTimeLimit" errors.

But to be noted, in the shard split scope, we have this goal has completed

Be resilient to failover including elections, node restarts, and transient network errors.

In case if we are going with the latter solution, we should inform the Cloud that shard split is not completely resilient to failovers + make a note in the scope document + update the arch guide if needed.

Assignee:: Matt Broadstone

Reporter:: Suganthi Mani

Participants:: Didier Nadeau, Githook User, Matt Broadstone, Suganthi Mani

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: Mar 27 2023 03:11:08 PM UTC

Updated:: Oct 29 2023 09:24:11 PM UTC

Resolved:: Jun 23 2023 01:46:53 PM UTC

Confidence Status Last Update:: 05/May/23 8:01 PM

Details

Description

Attachments

Activity

People

Dates