[SERVER-58939] Set catchUpTimeoutMillis if catchup takeover is disabled Created: 28/Jul/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Wenbin Zhu Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-57262 Allow nodes to vote for candidates wi... Closed
Assigned Teams:
Replication
Participants:

 Description   

By default `catchUpTimeoutMillis` is set to -1 (infinity). If user disabled catchup takeover, without setting `catchUpTimeoutMillis`, and for whatever reason the primary is stuck in catchup/drain mode, then the system can freeze. Although users can manually issue `replSetAbortPrimaryCatchUp` command to abort catchup, this is still not good user experience because it involves user intervention and long unavailability window. We should either warn users when they disable catchup takeover without setting `catchUpTimeoutMillis` or alternatively fail the configuration when that happens.



 Comments   
Comment by Judah Schvimer [ 11/Nov/21 ]

Given that it is likely fairly rare that someone disables catchup takeover, putting this on the backlog. alan.zheng or wenbin.zhu, please comment if you disagree.

Generated at Thu Feb 08 05:45:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.