[SERVER-25745] Ensure that config servers always ping their distributed locks before exiting drain mode and becoming primary Created: 22/Aug/16 Updated: 05/Apr/17 Resolved: 19/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Sprint: | Sharding 2016-08-29, Sharding 2016-09-19, Sharding 2016-10-10 |
| Participants: |
| Description |
|
Currently, if you haven't had a config server primary for over 15 minutes and then a primary gets elected, there's a race where other processes could overtake distributed locks owned by the config server because they will appear to be abandoned. |
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 19/Sep/16 ] |
|
It has become apparent that this ticket is unnecessary. This is because the code here will detect a change of primary and update it's ping history to reject distlock overtake attempts for another 15 minutes. |
| Comment by Spencer Brody (Inactive) [ 29/Aug/16 ] |
|
Prior to 3.4 the config server didn't own any of the distributed locks, so this problem didn't exist. This is to support balancer recovery, where on failover the balancer picks up any ongoing migrations where they left off, but for that to work you have to ensure that the balancer can get back any collection distributed locks it had from any ongoing migrations. |
| Comment by Randolph Tan [ 25/Aug/16 ] |
|
Can't the new primary simply get them back normally? How is this different from the case when initially upgrading the cluster config servers to v3.4? |