[SERVER-65328] ShardingCatalogManager::commitChunkMigration() should reject requests that specifies chunk boundaries with outdated shard key patterns Created: 07/Apr/22  Updated: 29/Oct/23  Resolved: 14/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc1, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Paolo Polato Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0
Sprint: Sharding EMEA 2022-04-18
Participants:
Linked BF Score: 151

 Description   

The changes introduced with SERVER-64148 modified the query used by ShardingCatalogManager::commitChunkMigration() to update the entries in config.chunks in a way that requests based on stale shard key patterns are no longer rejected.

This behaviour may lead to a state of data corruption that prevents new migrations from being performed.



 Comments   
Comment by Githook User [ 15/Apr/22 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-65328 MigrationSourceManager must reject bounds with outdated shard key patterns

(cherry picked from commit 1d3a714051b9e50fe48bf6e53ed9d063ee13caed)
Branch: v6.0
https://github.com/mongodb/mongo/commit/2fd48fab86439f6e29dedd6042cab1a6e7f3a1ff

Comment by Githook User [ 14/Apr/22 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-65328 MigrationSourceManager must reject bounds with outdated shard key patterns
Branch: master
https://github.com/mongodb/mongo/commit/1d3a714051b9e50fe48bf6e53ed9d063ee13caed

Comment by Allison Easton [ 07/Apr/22 ]

To add to the description, the big difference introduced in SERVER-64148 is how the new chunk is being created. Prior to these changes, the chunk bounds are being set to the values read from the config server. After the changes, they are being set to the chunk that is passed in during the migration. So even though the query could have found a chunk when it possibly shouldn't have before, it didn't cause issues because the bounds were still being set correctly.

Edit: having finally reproduced the issue with some logging, we have a better understanding of what is happening. First, a refine shard key happens successfully. Then, a merge happens, which updates the keys correctly. Then, a migration is issued from defragmentation with the old bounds and old shard key. This finds the chunk that was merged because the query is not using less than or equal to and greater than or equal to. The commit then takes the move range path, splitting the chunk and creating chunks on either side of it also with the unrefined shard key. This is how we end up with three chunks, one with the max wrong, one with both keys wrong, and one with the wrong min.

Generated at Thu Feb 08 06:02:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.