[SERVER-70600] dropDatabase DDL operation might complete with the old primary shard still believing that is the primary Created: 17/Oct/22  Updated: 30/Mar/23  Resolved: 30/Mar/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.14, 6.0.4, 6.3.0-rc0, 6.2.0-rc6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sergi Mateo Bellido Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Issue split
split to SERVER-73390 Mitigate database version regression ... Closed
split to SERVER-73391 Use recoverable critical section for ... Closed
Related
is related to SERVER-73391 Use recoverable critical section for ... Closed
Assigned Teams:
Sharding EMEA
Operating System: ALL
Sprint: Sharding EMEA 2022-11-14, Sharding EMEA 2022-11-28, Sharding EMEA 2022-12-12, Sharding EMEA 2023-01-23
Participants:
Linked BF Score: 6

 Description   

Unlikely scenario, but something that we have to fix. The problematic interleaving is the following one:

S1:N0: starts a drop database.
S1:N0: drops all sharded collections.
S1:N0: runs dropDB on all shards + clears the db metadata in all nodes.
S1:N0: steps down but managed to send the command to remove the authoritative data to the CSRS.
S1:N1: Steps up.
S1:N1: Some operation recovers the metadata associated to the db being dropped.
CSRS: Removes the db entry from config.databases.
S1:N1: Resumes the execution of the dropDatabase.
S1:N1: No information associated to that dbName is present on the CSRS, so we jump to the second phase of the coordinator, in which we send a flushRoutingTable to all nodes but the primary.
S1:N1: Completes the execution of the dropDatabase but the primary node still believes it is the primary shard for that db name. 

 

 


Generated at Thu Feb 08 06:16:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.