[SERVER-58939] Set catchUpTimeoutMillis if catchup takeover is disabled Created: 28/Jul/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Wenbin Zhu | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Participants: | |||||||||
| Description |
|
By default `catchUpTimeoutMillis` is set to -1 (infinity). If user disabled catchup takeover, without setting `catchUpTimeoutMillis`, and for whatever reason the primary is stuck in catchup/drain mode, then the system can freeze. Although users can manually issue `replSetAbortPrimaryCatchUp` command to abort catchup, this is still not good user experience because it involves user intervention and long unavailability window. We should either warn users when they disable catchup takeover without setting `catchUpTimeoutMillis` or alternatively fail the configuration when that happens. |
| Comments |
| Comment by Judah Schvimer [ 11/Nov/21 ] |
|
Given that it is likely fairly rare that someone disables catchup takeover, putting this on the backlog. alan.zheng or wenbin.zhu, please comment if you disagree. |