[SERVER-76680] Set a long election timeout for shard split passthroughs Created: 28/Apr/23  Updated: 02/May/23  Resolved: 02/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Didier Nadeau Assignee: Mathis Bessa
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Serverless
Operating System: ALL
Sprint: Server Serverless 2023-05-15
Participants:
Linked BF Score: 10

 Description   

Spurious elections on the recipient set might cause shard split passthrough. If there's a failover between stepping up the recipient primary and appending the oplog note (tasks done by the shard split service) appending the oplog note won't succeed and shard split will fail.

We should explicitly set an high timeout for shard split passthrough to reduce the likelyhood of that happening.



 Comments   
Comment by Mathis Bessa [ 02/May/23 ]

Closing this since shard split passthroughs are already defaulting 
electionTimeoutMillis to 24 hours. See this comment.

Comment by Didier Nadeau [ 01/May/23 ]

Additional context : this scenario (election between replSetStepUp and writing oplog note) occured in a JS test in BF-28002. We expect this same scenario can also happen in passthrough so we want to increase the timeout in split passthroughs.

Generated at Thu Feb 08 06:33:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.