[SERVER-78414] Recipient shard in chunk migration can skip fetching changes to the migrated range, leading to lost writes Created: 23/Jun/23  Updated: 29/Oct/23  Resolved: 26/Jun/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.2.0, 4.4.0, 5.0.0, 6.0.0, 7.0.0-rc5
Fix Version/s: 7.1.0-rc0, 5.0.19, 4.4.23, 7.0.0-rc6, 6.0.8

Type: Bug Priority: Critical - P2
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-78415 Avoid sending unrelated operations fr... Backlog
is related to SERVER-40791 Chunk migration clone blocks behind p... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0, v6.0, v5.0, v4.4
Sprint: Sharding NYC 2023-06-26, Sharding NYC 2023-07-10
Participants:
Linked BF Score: 135

 Description   

The recipient shard fetches changes to the documents in the range being migrated by continuously running the _transferMods command against the donor shard. The donor shard has the changes buffer in memory.

The termination condition for when the recipient shard stops attempting to fetch changes from the donor shard is when both of the following conditions are satisfied:

However it is possible even when none of the changes returned by the _transferMods commands are in the range being migrated for there to still be a pending batch on the donor shard which would have changes in the range being migrated. In particular, transactions are not filtering out changes which are unrelated to range being migrated and can lead to an entire _transferMods command having changes which are wholly unrelated to the range being migrated (SERVER-78415).



 Comments   
Comment by Githook User [ 27/Jun/23 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-78414 Apply _transferMods batches until size returned == 0.

It is possible for the entire _transferMods command response to contain
changes which aren't relevant for the range being migrated. The
recipient shard should continue running the _transferMods command on the
donor shard primary in this case until it learns there are no further
changes.

(cherry picked from commit a8fdfb121dfc5e35e20a036c8198f2e9240d5b61)
Branch: v4.4
https://github.com/mongodb/mongo/commit/dccf28cd32cf5caddfd538ea3b698619468f74ea

Comment by Githook User [ 26/Jun/23 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-78414 Apply _transferMods batches until size returned == 0.

It is possible for the entire _transferMods command response to contain
changes which aren't relevant for the range being migrated. The
recipient shard should continue running the _transferMods command on the
donor shard primary in this case until it learns there are no further
changes.

(cherry picked from commit a8fdfb121dfc5e35e20a036c8198f2e9240d5b61)
Branch: v7.0
https://github.com/mongodb/mongo/commit/453091a36400607b1e0c57aff596cffe22a82eed

Comment by Githook User [ 26/Jun/23 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-78414 Apply _transferMods batches until size returned == 0.

It is possible for the entire _transferMods command response to contain
changes which aren't relevant for the range being migrated. The
recipient shard should continue running the _transferMods command on the
donor shard primary in this case until it learns there are no further
changes.

(cherry picked from commit a8fdfb121dfc5e35e20a036c8198f2e9240d5b61)
Branch: v6.0
https://github.com/mongodb/mongo/commit/a06a6bb1c890c8d84e94e48d0110cd67d0ef9da3

Comment by Githook User [ 26/Jun/23 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-78414 Apply _transferMods batches until size returned == 0.

It is possible for the entire _transferMods command response to contain
changes which aren't relevant for the range being migrated. The
recipient shard should continue running the _transferMods command on the
donor shard primary in this case until it learns there are no further
changes.

(cherry picked from commit a8fdfb121dfc5e35e20a036c8198f2e9240d5b61)
Branch: v5.0
https://github.com/mongodb/mongo/commit/db4db96b4b66a4fb366694dfef103847921d5299

Comment by Githook User [ 26/Jun/23 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-78414 Apply _transferMods batches until size returned == 0.

It is possible for the entire _transferMods command response to contain
changes which aren't relevant for the range being migrated. The
recipient shard should continue running the _transferMods command on the
donor shard primary in this case until it learns there are no further
changes.
Branch: master
https://github.com/mongodb/mongo/commit/a8fdfb121dfc5e35e20a036c8198f2e9240d5b61

Generated at Thu Feb 08 06:38:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.