[SERVER-78851] movePrimary may fail on clone phase if $out runs concurrently Created: 11/Jul/23  Updated: 29/Oct/23  Resolved: 28/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.0.1, 7.1.0-rc4
Fix Version/s: 7.1.1, 7.2.0-rc0, 7.0.3

Type: Bug Priority: Major - P3
Reporter: Silvia Surroca Assignee: Silvia Surroca
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File reproducibleTest_SERVER-78851.diff    
Issue Links:
Backports
Tested
tested by SERVER-78852 Test movePrimary and $out running con... Closed
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.1, v7.0
Sprint: Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04, Sharding EMEA 2023-09-18, Sharding EMEA 2023-10-02
Participants:

 Description   

movePrimary may fail on the clone phase with error 7118501 when $out is executed at the same time.

 

The sequence of events to end up with the mentioned error is:

  1. $out operation starts
  2. the temporal collection for $out is created
  3. movePrimary starts
  4. movePrimary starts the clone phase
  5. $out operation fails with a MovePrimaryInProgress error
  6. the temporal collection is dropped here
  7. movePrimary finishes the clone phase
  8. movePrimary fails here because cloned collections don't match with the expected collections to clone


 Comments   
Comment by Githook User [ 17/Oct/23 ]

Author:

{'name': 'Silvia Surroca', 'email': 'silvia.surroca@mongodb.com', 'username': 'silviasuhu'}

Message: SERVER-78851 movePrimary may fail on clone phase if out runs concurrently

(cherry picked from commit a95a5b2ee3108d4c02bb9fcfd6749495d4df248b)
Branch: v7.1
https://github.com/mongodb/mongo/commit/e57d0b9f1a2d6eb2b7a4c8dce9e1e43c91779266

Comment by Githook User [ 09/Oct/23 ]

Author:

{'name': 'Silvia Surroca', 'email': 'silvia.surroca@mongodb.com', 'username': 'silviasuhu'}

Message: SERVER-78851 movePrimary may fail on clone phase if out runs concurrently
Branch: v7.0
https://github.com/mongodb/mongo/commit/48ff06424ebe4dae4b8978ec1a41901334565c5f

Comment by Githook User [ 28/Sep/23 ]

Author:

{'name': 'Silvia Surroca', 'email': 'silvia.surroca@mongodb.com', 'username': 'silviasuhu'}

Message: SERVER-78851 movePrimary may fail on clone phase if out runs concurrently
Branch: master
https://github.com/mongodb/mongo/commit/a95a5b2ee3108d4c02bb9fcfd6749495d4df248b

Comment by Silvia Surroca [ 20/Jul/23 ]

I've attached a reproducible test.

It needs to update a fail point placement, so you will need to recompile the code to be able to reproduce the issue.

Generated at Thu Feb 08 06:39:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.