[SERVER-26397] Look for new sync source more frequently while in catchup mode Created: 29/Sep/16 Updated: 25/Jul/18 Resolved: 25/Jul/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Vesselina Ratcheva (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Sprint: | Repl 2017-12-04, Repl 2017-12-18, Repl 2018-07-16, Repl 2018-07-30 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
If a recently-elected primary is in catchup mode but has no sync source, it's delaying becoming a fully-usable primary, but not actually doing any work. It's possible that when it first got elected and looked for a sync source there was no good sync source available, but then one becomes available while it is in catchup mode. We should be checking for new sync sources more frequently than we normally do if we're in catchup mode, since the whole node is otherwise just sitting idle. |
| Comments |
| Comment by Siyuan Zhou [ 25/Jul/18 ] |
|
We are not sure if this would fix the original BF and we haven't seen this elsewhere. We also don't want to expose the state of ReplicationCoordinatorImpl to bgsync, because the concurrency rules don't allow bgsync to call into ReplicationCoordinatorImpl. The problem will only happen if the replset is pretty quiet but a write occurs during the 2 seconds of heartbeat interval, which should be rare in reality. The worse case is to wait for 1 more second. There is a real case where this is a valid improvement, it's just a really unlikely and uncommon case that we've decided isn't worth the extra complexity to the system to address. Closing this as "Won't Fix". We can reopen this when it occurs in the future. |
| Comment by Spencer Brody (Inactive) [ 23/Jul/18 ] |
|
I'm not positive but I think the case I was alluding to when I filed this ticket was:
A more ideal solution would probably be to wake up the bgsync thread whenever we get a heartbeat or replSetUpdatePosition with new information that may affect its ability to find a sync source, but that is likely a much more complex change. |
| Comment by Siyuan Zhou [ 12/Jul/18 ] |
|
vesselina.ratcheva, if we cannot find a good sync source, how did we know there is a node with higher optime? I'm curious why there was no good sync source available when it first got elected and looked for a sync source. |