[SERVER-37051] ShardServerCatalogCacheLoader does not check the internal term after reading from the task queue Created: 07/Sep/18 Updated: 29/Oct/23 Resolved: 11/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.7, 4.0.2 |
| Fix Version/s: | 3.6.10, 4.0.5, 4.1.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||
| Sprint: | Sharding 2018-09-24 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 57 | ||||||||
| Description |
|
There is a race condition in ShardServerCatalogCacheLoader, where if a shard node is running as a primary and a step-down happens, it may read in-memory task queue and persisted cache state, which is not consistent. Specifically, consider a node which is a primary and found some data reading from the config server here. It will then schedule this data to be persisted to the cache collections and then will proceed to do a merge of the task queue + what's already persisted in order to produce a list of the changed chunks. In a stepdown-free case, this would work fine. However, if by the time it got to read what it persisted and what is on the queue, the node stepped down, neither the write to the cache collections could have happened, nor anything remained on the task queue because of the change in term. That way it could come back with incomplete data (which would be a data loss) or it could come back with an empty list, which will invariant. In order to fix it, after reading from the task queue + persisted cache, we should check if the term has changed here and throw ConflictingOperationInProgress error so the load can be retried as secondary. |
| Comments |
| Comment by Githook User [ 21/Nov/18 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: (cherry picked from commit fe8f517a59d694b7577da564d19e4415e13831e8) |
| Comment by Githook User [ 16/Nov/18 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: (cherry picked from commit fe8f517a59d694b7577da564d19e4415e13831e8) |
| Comment by Githook User [ 11/Sep/18 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: |
| Comment by Kaloian Manassiev [ 07/Sep/18 ] |
|
Yes - my mistake, forgot to add it. |
| Comment by Gregory McKeon (Inactive) [ 07/Sep/18 ] |
|
kaloian.manassiev is this 4.1 required as well? |