[SERVER-42960] config.cache chunks can get into a bad state Created: 21/Aug/19 Updated: 06/Dec/22 Resolved: 18/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | [DO NOT USE] Backlog - Sharding EMEA |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Sharding EMEA
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 15 | ||||
| Description |
|
when interrupted in between trying to persist new chunk changes or when the changes gets partially rolled back. When shard tries to persist new updates to config.cache, it sets the refreshing flag, deletes the current chunk document and then inserts the newer version. And finally unsets the reloading flag after it finishes processing the updated chunks. However, if the mongod crashed in the middle or some of these writes get partially rolled back (such that insert was rolled back, but not the delete), the config.cache will now be in an inconsistent state. |
| Comments |
| Comment by Kaloian Manassiev [ 18/Feb/22 ] |
|
We are going to a world where we will throw out the local cache collections. |
| Comment by Randolph Tan [ 22/Aug/19 ] |
|
Note: The config.cache entries will become consistent again when a refresh happens on the primary. |
| Comment by Randolph Tan [ 21/Aug/19 ] |
|
Need to double check if primary shard refreshing will make it become consistent again. It does look like the secondaries won't be able to tell the primaries to refresh because it can hit the "Gap exists in the routing table" error before it tries to make primary force a refresh. |