[SERVER-65371] MigrationSourceManager running on secondary node may trip invariant Created: 08/Apr/22  Updated: 29/Oct/23  Resolved: 02/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.3.0, 5.0.6
Fix Version/s: 5.0.10, 6.0.0-rc9, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Jordi Serra Torrens Assignee: Paolo Polato
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 0001-Repro-BF-24832.patch     Text File 0001-SERVER-65371-Ensure-MigraitonSourceManager-is-only-i.patch    
Issue Links:
Backports
Depends
Problem/Incident
is caused by SERVER-62296 MoveChunk should recover any unfinish... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.3, v5.0
Steps To Reproduce:

0001-Repro-BF-24832.patch

./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding jstests/sharding/bf-24832-repro.js  --log=file

Sprint: Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30, Sharding EMEA 2022-06-13
Participants:
Linked BF Score: 48

 Description   

The shardsvr's 'moveChunk' is allowed on primary nodes only. However this check is just a best effort – the member state could change anytime later and the command will continue.
The command body does take some precautions to ensure a stable member state: It briefly takes the GlobalLock in mode IX to:
(1) Flag that opCtx as should be killed on stepdown
(2) Synchronize with the thread that kills opCtxs on stepdown
This ensures that the MigrationSourceManager will will run on a single term (see BF-24411). However, it doesn't ensure that this node is primary. For instance, the following interleaving could happen:
1. The node is primary when this is evaluated
2. The node becomes secondary here
3. Here the opCtx will get flagged as killable on stepdown, but the node has already stepped down, so it won't be interrupted!

In this scenario the command will continue executing and will instantiate a MigrationSourceManager:
4. The MSM will check that there are no migrations pending recovery. Assume that there are none at this point.
5. Now the new primary starts a migration, inserts its recovery document and the old primary replicates it.
6. Now the old primary evaluates this invariant, find the document inserted on (5) and crashes.



 Comments   
Comment by Githook User [ 09/Jun/22 ]

Author:

{'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}

Message: SERVER-65371 Interrupt moveChunk when the node steps down
Branch: v5.0
https://github.com/mongodb/mongo/commit/4d0422a898663e66098d00fce5823f4b7ab48b83

Comment by Githook User [ 08/Jun/22 ]

Author:

{'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}

Message: SERVER-65371 Interrupt shardSvrMoveRange when the node steps down
Branch: v6.0
https://github.com/mongodb/mongo/commit/daf78f2f252f09913ff0f4c716580dbfc8cdc7f3

Comment by Githook User [ 02/Jun/22 ]

Author:

{'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}

Message: SERVER-65371 Interrupt shardSvrMoveRange when the node steps down
Branch: master
https://github.com/mongodb/mongo/commit/6b6244e58348eea52810cbb4ce1543a5943b6ee4

Comment by Githook User [ 06/May/22 ]

Author:

{'name': 'Sviatlana Zuiko', 'email': 'sviatlana.zuiko@mongodb.com', 'username': 'szuiko'}

Message: Revert "SERVER-65371 Ensure that moveRange gets interrupted when the donor steps down"

This reverts commit 417cd065b9f437f01269be04941a183b096f9db5.
Branch: master
https://github.com/mongodb/mongo/commit/1113373e84ac5510c4546ec71ec03e6b15d45749

Comment by Githook User [ 05/May/22 ]

Author:

{'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}

Message: SERVER-65371 Ensure that moveRange gets interrupted when the donor steps down
Branch: master
https://github.com/mongodb/mongo/commit/417cd065b9f437f01269be04941a183b096f9db5

Comment by Jordi Serra Torrens [ 08/Apr/22 ]

This could be fixed by checking that the node is primary after the opCtx has been marked as interruptible here. This guarantees that the MigrationSourceManager only runs on a primary node, and should that node stop being primary (and thus possibly replicating writes done by the new primary), the MSM will first be interrupted. Attaching patch with this proposal.
0001-SERVER-65371-Ensure-MigraitonSourceManager-is-only-i.patch

Generated at Thu Feb 08 06:02:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.