[SERVER-73484] Serialize createIndexes and dropIndexes with movePrimary Created: 31/Jan/23  Updated: 14/Sep/23  Resolved: 14/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Antonio Fuschetto Assignee: Antonio Fuschetto
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-75675 Ensure indexes are created in all shards Open
Assigned Teams:
Sharding EMEA
Sprint: Sharding EMEA 2023-09-04, Sharding EMEA 2023-09-18
Participants:

 Description   

The cloning phase of the movePrimary command conflicts with some operations, e.g. createIndexes and dropIndexes, which must fail or be serialized. For this purpose, when the cloning phase of the movePrimary command runs, it sets the in-memory MovePrimaryInProgress flag, which is checked by potentially conflicting operations. Conversely, when the cloning phases is completed, the flag is unset.

The status of this flag is not persisted, exposing the cluster to the following potential:

  1. Node_1 is the primary and runs a movePrimary operation
  2. Node_1 sets the MovePrimaryInProgress flag
  3. Node_1 steps down while the flag is still set
  4. Node_2 is elected as a primary and recovers the operation (so, sets the flag)
  5. Node_2 completes the operation and unsets the flag (locally)
  6. Sooner or later, Node_1 steps up again ==> the MovePrimaryInProgress flag is still set

The strategic solution is to reimplement the createIndexes and dropIndexes commands leveraging the DDL coordinator. In this way, these would be serialized automatically with the MovePrimary operations and it would no longer be necessary to use the MovePrimaryInProgress flag.

However, a short-term (tactical) solution might be to enhance these commands using the DDL locking. This would avoid using the DDL coordinator (expensive implementation), but still allow these operations to be serialized with the movePrimary, making the flag MovePrimaryInProgress redundant.



 Comments   
Comment by Antonio Fuschetto [ 14/Sep/23 ]

After double checking the code, both the non-resilient and resilient movePrimary implementations reset the MovePrimaryInProgress in-memory flag in the event of a step-down. Specifically:

  • Version 6.0 (non-resilient version): the flag is reset in the cleanup procedure, that is invoked in case of any error.
  • Version 7.2 (resilient version): the flag is reset when the cloning phase is left, both in case of success or error.

Consequently, the described problematic scenario cannot occur.

In the context of SERVER-75675, both the createIndex and dropIndex commands will be implemented on the top of the DDL coordinator, offering the serialization with any other DDL operation.

Generated at Thu Feb 08 06:24:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.