[SERVER-42431] New config server primary unlocks all distlocks held by previous config server on stepup Created: 25/Jul/19  Updated: 27/Oct/23  Resolved: 10/Feb/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Gone away Votes: 0
Labels: sharding-DDL-bugs
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File configsvrStepdownRaceRepro.txt     File configsvr_failover_repro.js    
Assigned Teams:
Sharding EMEA
Operating System: ALL
Sprint: Sharding 2019-07-29
Participants:

 Description   

Config servers all have the same process id of "ConfigServer."

On transition to primary, a config node unlocks all existing distlocks with the process id "ConfigServer."

This means that DDL operations which serialize on the config server via a distlock but whose business logic is executed by a shard (moveChunk, movePrimary, and shardCollection) are suspect, because the shard can keep executing the business logic outside the distlock. For example, you could drop a database concurrently with sharding a collection and end up with a config.collections entry without a corresponding config.databases entry.

Note that the track unsharded project will add two more DDL operations with the shardCollection pattern (renameCollection and convertToCapped).



 Comments   
Comment by Kaloian Manassiev [ 10/Feb/22 ]

This is now Gone Away after 5.0, because under the DDL project we have local synchronisation between DDL and moveChunk, which was the main reason for having the config server dist lock.

Comment by Esha Maharishi (Inactive) [ 29/Jul/19 ]

kaloian.manassiev, hmm, I think the sharding catalog can only get corrupted by metadata commands for which the shard directly writes to the sharding catalog. So, I think it is only a problem for shardCollection, movePrimary, and moveChunk. This is because config servers use ConnectionString::forLocal and for ConnectionString::LOCAL, the ShardRegistry returns ShardLocal instances. So, if the config server is the one that writes to the sharding catalog, it will write on the same branch of history as the distlock was on.

The problem is if a shard targets the new config primary, and therefore updates the sharding catalog on the new branch of history that has released the distlock.

Note that this second issue:

Another way this can manifest that would result in actual user data loss is if an old config primary executed dropCollection after the collection had been recreated on the new config primary.

would not be solved even if the new config primary reacquired persisted locks.

Comment by Kaloian Manassiev [ 29/Jul/19 ]

When we moved the balancer to the config server in 3.4, the collection lock acquisition due to moveChunk was the only operation, which was taking distributed locks on the config server. At the time, we must have (knowingly or not) made the decision that it would be cleaner to have step-up clean-up these locks so that (1) the migration manager recovery doesn't get stuck and (2) because the migration manager recovery will re-acquire them.

I guess in 3.6 we moved more operations to the config server, which didn't have the recovery process that moveChunk goes through and because of this we introduced this bug.

Given that the lock manager will re-acquire locks with the same session id, we could remove this behaviour where the lock manager removes all locks by making the MigrationManager have a fixed session id and on step-up only unlocking locks acquired by that session id. This will preserve the intra-node synchronization behaviour we rely on through the dist locks, without requiring us to build a more sophisticated dist lock manager.

Comment by Jason Zhang [ 25/Jul/19 ]

Attached is a repro. For consistent repro you need to change mongos's internal retries to 1. This is because after stepping down the config server, mongos will retry 'shardCollection' on the new config server, which will take distlocks that will block dropCollection from taking distlocks on the target database.

Comment by Esha Maharishi (Inactive) [ 25/Jul/19 ]

Assigned to jason.zhang to write a repro that demonstrates the sharding catalog becoming inconsistent if the config server fails over during shardCollection.

Generated at Thu Feb 08 05:00:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.