[SERVER-68402] Split donor should wait for blockTimestamp to be majority committed on donor nodes before sending splitConfig Created: 28/Jul/22  Updated: 27/Oct/23  Resolved: 28/Jul/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: [DO NOT USE] Backlog - Server Serverless (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Serverless
Participants:

 Description   

Right now the split donor waits for blockTimestamp to get applied on all recipient nodes, then sends the splitConfig.

I think the donor should also wait for blockTimestamp to become majority committed in the donor set before sending the splitConfig.

Otherwise the donor could fail over and the new donor primary can choose a new blockTimestamp and try waiting for it to get applied on all recipient nodes, but the recipient nodes are already gone. The donor may hang or abort the split in this case today.



 Comments   
Comment by Esha Maharishi (Inactive) [ 28/Jul/22 ]

Even with majority commit the primary could step down during the write with some nodes having seen the oplog. A recipient node that sees the oplog will stop replicating, even if it's not majority-committed.

The donor should abort the split on timing out waiting to hear that the recipient nodes have applied the new blockTimestamp, since the donor waits for this with an opCtx created from a CancelableOperationContextFactory created with the abortToken.

Generated at Thu Feb 08 06:10:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.