[SERVER-70321] Collmod coordinator must not resume migrations on retriable errors Created: 07/Oct/22  Updated: 29/Oct/23  Resolved: 25/Jan/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.2, 6.1.0-rc4, 6.2.0-rc0
Fix Version/s: 6.3.0-rc0, 6.0.5

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Allison Easton
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File collmod_changes.diff    
Issue Links:
Backports
Problem/Incident
is caused by SERVER-61760 The new implementation of CollMod sho... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.2, v6.0
Sprint: Sharding EMEA 2022-11-14, Sharding EMEA 2022-11-28, Sharding EMEA 2022-12-12, Sharding EMEA 2022-12-26, Sharding EMEA 2023-01-23, Sharding EMEA 2023-02-06
Participants:
Linked BF Score: 120

 Description   

Collmod coordinator may resumes migrations after hitting a retriable error.

This could lead to wrong execution scenario like the following:

  1. Collmod starts, stop migrations and enter the kUpdateConfig phase
  2. Hit a retriable error and unblocks migrations
  3. Attempt to re-execute the kUpdateConfig but this time with the migrations unblocked

 

Keep in mind that we can't simply resume migrations on non-retriable error, in fact even after hitting a non-retriable error we can't guarantee that the coordinator won't be recovered and re-executed from a new primary node in case of stepdown.



 Comments   
Comment by Githook User [ 31/Jan/23 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-70321 Collmod coordinator must not resume migrations on retriable errors
Branch: v6.0
https://github.com/mongodb/mongo/commit/f5e0d58f5f7ae5971ec7be2646cb497793a4ff0e

Comment by Githook User [ 24/Jan/23 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-70321 Collmod coordinator must not resume migrations on retriable errors
Branch: master
https://github.com/mongodb/mongo/commit/eb7ac315931dcc7f69135052b094e5253316b0a1

Generated at Thu Feb 08 06:15:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.