[SERVER-10621] Replication failover hangs indefinitely when priorities conflict with replication Created: 26/Aug/13 Updated: 06/Dec/22 Resolved: 21/Dec/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.6 |
| Fix Version/s: | 3.2.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aristarkh Zagorodnikov | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | elections | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 12.04LTS |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
This is a continuation of This appears to be a deadlock between priority and freshness.
db2 log is filled with:
I guess the load generator timing is important to reproducability. If db2 replicates from db0 faster than db1 does, it always gets into this deadlock.
Also, while this might be unrelated, there are following entries in the log of db1:
I'm not sure what's your policy on exceptions in destructors of course, but this might be an indicator of an improperly-cleaned up connection. |
| Comments |
| Comment by Eric Milkie [ 21/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
A new election scheme for priority was introduced in 3.2, which does not have this problem. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 23/Oct/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Any news on this one? We resorted to resetting all priorities to default (1), but this is not what we would like to use =) | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 28/Aug/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The following patch fixes things for me. Not sure if relaxing priorities based on possibly stale (due to heartbeats) optime comparison is a good solution though.
| ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 27/Aug/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
P.P.S. If you would like better reproducability, I believe that doing some kind of rate limiting (via nice or TCP shaping) might do the trick. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 27/Aug/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
P.S. We originally got this problem on shard replica sets that have oplogs that are measured in months (54 days with current, actually low, write volume):
| ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 27/Aug/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The oplog size is irrelevant (it may affect the reproducibility because of I/O factors, but it's still not the root cause). Please pay attention to the report, let me indicate the important part:
As it is visible from this case, the timestamps of last operations are one operation apart from each other.
db1:
| ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matt Dannenberg [ 27/Aug/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I tried your repro and it was hanging as you've described about half of the time. However, when I changed the oplog size to be two orders of magnitude larger (one may have sufficed, but I didn't try it), I no longer encountered the failure. What I believe is happening the higher priority less fresh member cannot catch up because its most recent oplog entry is no longer in the lower priority member's oplog (despite being less than 10 mins behind). | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 26/Aug/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Attached original case's repro scripts |