[SERVER-73539] stopMigrations/resumeMigrations don't have replay protection Created: 02/Feb/23  Updated: 29/Oct/23  Resolved: 03/May/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Sergi Mateo Bellido Assignee: Marcos José Grillo Ramirez
Resolution: Fixed Votes: 0
Labels: auto-reverted, sharding-ddl-replay-protection, shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
causes SERVER-76836 setAllowMigrations is executing remot... Closed
causes SERVER-77304 stopMigrations is not idempotent anymore Closed
Related
related to SERVER-78021 Retrying setAllowMigrations command m... Closed
related to SERVER-79026 Failing to cancel the JournalFlusher ... Closed
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Sprint: Sharding EMEA 2023-04-03, Sharding EMEA 2023-04-17, Sharding EMEA 2023-05-01, Sharding EMEA 2023-05-15
Participants:
Linked BF Score: 135
Story Points: 3.33

 Description   

Currently several DDL coordinators like rename, collmod and drop collection use configsvrSetAllowMigrations command to stop migrations while the coordinator runs because eventually there will be a metadata change and a migration to a shard that previously did not have metadata might not find out of the change.

However, the command does not have replay protection, which could cause the following scenario:

  1. A DDL coordinator sends a configsvrSetAllowMigration command that gets held in a router due to slowness in the networks
  2. There is a stepdown and the new primary executes the DDL fully, unlocking the migrations at the end of the coordinator
  3. The command delayed in 1 comes in and blocks the migrations for the collection

We can prevent this by adding replay protection (like configsvrRemoveChunk) to configsvrSetAllowMigrations.



 Comments   
Comment by Githook User [ 30/Apr/23 ]

Author:

{'name': 'Marcos José Grillo Ramirez', 'email': 'marcos.grillo@mongodb.com', 'username': 'm4nti5'}

Message: SERVER-73539 Add replay protection in DDL when setting the allow migrations flag
Branch: master
https://github.com/mongodb/mongo/commit/b1cff9e72798e2533586d94c788f4ac717d559b7

Comment by xgen-buildbaron-user [ 29/Apr/23 ]

Ticket re-opened due to revert. sharding began a consistent failure of jstests/sharding/configsvr_set_allow_migrations.js

Comment by Githook User [ 29/Apr/23 ]

Author:

{'name': 'auto-revert-processor', 'email': 'dev-prod-dag@mongodb.com', 'username': ''}

Message: Revert "SERVER-73539 Add replay protection in DDL when setting the allow migrations flag"

This reverts commit 20ebf6b1ccdf7a3e4fcb99547cd4c23f6e0b746a.
Branch: master
https://github.com/mongodb/mongo/commit/d6087c6eb2130f8de42043120b64b58215540ef0

Comment by Githook User [ 28/Apr/23 ]

Author:

{'name': 'Marcos José Grillo Ramirez', 'email': 'marcos.grillo@mongodb.com', 'username': 'm4nti5'}

Message: SERVER-73539 Add replay protection in DDL when setting the allow migrations flag
Branch: master
https://github.com/mongodb/mongo/commit/20ebf6b1ccdf7a3e4fcb99547cd4c23f6e0b746a

Generated at Thu Feb 08 06:24:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.