[SERVER-48566] Shard loader in primary nodes blindly read the version of config.cache.collections Created: 03/Jun/20  Updated: 29/Oct/23  Resolved: 29/Jun/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.2

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.js    
Issue Links:
Backports
Depends
Related
is related to SERVER-44105 Perform ShardServerCatalogLoader writ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-06-15, Sharding 2020-06-29
Participants:
Linked BF Score: 24

 Description   

When the catalog cache loader does a refresh from scratch, it first checks the maximum version that it has from config.cache.collections. This version will be used in the query to get the chunk diffs. It will then combine the result from the diff and config.cache.chunks to form the complete collection metadata. The issue, however, is that the writes in the config.cache.collections and config.cache.chunks are not atomic. This means that it is possible for the version to be higher than what is contained in the config.cache.chunks. This can make the loader falsely believe that it is a certain version, but the contents of the chunk map does not agree with it.

One example manifestation is that when a shard key refine happens, then the primary stepped down after it modified the config.cache.collections but before it was able to modify the config.cache.chunks (or was rolled back), the new primary will not be able to find the matching exact bounds when trying to migrate because the in memory metadata will still contain the older shard keys despite having the version of post refine.



 Comments   
Comment by Githook User [ 09/Sep/20 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-48566 Shard loader in primary nodes blindly read the version of config.cache.collections

(cherry picked from commit 3c9e077b966150d21c3459eeecf4765bee59b2d8)
Branch: v4.4
https://github.com/mongodb/mongo/commit/590341c7785a78ba6e824564d971cc72ae880d65

Comment by Githook User [ 29/Jun/20 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-48566 Shard loader in primary nodes blindly read the version of config.cache.collections
Branch: master
https://github.com/mongodb/mongo/commit/3c9e077b966150d21c3459eeecf4765bee59b2d8

Comment by Randolph Tan [ 03/Jun/20 ]

Attached test.js that demonstrates this bug.

Generated at Thu Feb 08 05:17:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.