[SERVER-42960] config.cache chunks can get into a bad state Created: 21/Aug/19  Updated: 06/Dec/22  Resolved: 18/Feb/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Sharding EMEA
Operating System: ALL
Participants:
Linked BF Score: 15

 Description   

when interrupted in between trying to persist new chunk changes or when the changes gets partially rolled back.

When shard tries to persist new updates to config.cache, it sets the refreshing flag, deletes the current chunk document and then inserts the newer version. And finally unsets the reloading flag after it finishes processing the updated chunks. However, if the mongod crashed in the middle or some of these writes get partially rolled back (such that insert was rolled back, but not the delete), the config.cache will now be in an inconsistent state.



 Comments   
Comment by Kaloian Manassiev [ 18/Feb/22 ]

We are going to a world where we will throw out the local cache collections.

Comment by Randolph Tan [ 22/Aug/19 ]

Note: The config.cache entries will become consistent again when a refresh happens on the primary.

Comment by Randolph Tan [ 21/Aug/19 ]

Need to double check if primary shard refreshing will make it become consistent again. It does look like the secondaries won't be able to tell the primaries to refresh because it can hit the "Gap exists in the routing table" error before it tries to make primary force a refresh.

Generated at Thu Feb 08 05:01:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.