[SERVER-25745] Ensure that config servers always ping their distributed locks before exiting drain mode and becoming primary Created: 22/Aug/16  Updated: 05/Apr/17  Resolved: 19/Sep/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2016-08-29, Sharding 2016-09-19, Sharding 2016-10-10
Participants:

 Description   

Currently, if you haven't had a config server primary for over 15 minutes and then a primary gets elected, there's a race where other processes could overtake distributed locks owned by the config server because they will appear to be abandoned.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 19/Sep/16 ]

It has become apparent that this ticket is unnecessary. This is because the code here will detect a change of primary and update it's ping history to reject distlock overtake attempts for another 15 minutes.

Comment by Spencer Brody (Inactive) [ 29/Aug/16 ]

Prior to 3.4 the config server didn't own any of the distributed locks, so this problem didn't exist.

This is to support balancer recovery, where on failover the balancer picks up any ongoing migrations where they left off, but for that to work you have to ensure that the balancer can get back any collection distributed locks it had from any ongoing migrations.

Comment by Randolph Tan [ 25/Aug/16 ]

Can't the new primary simply get them back normally? How is this different from the case when initially upgrading the cluster config servers to v3.4?

Generated at Thu Feb 08 04:10:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.