[SERVER-37339] Sharding state is set to initialized on Grid before sharding components are fully initialized Created: 27/Sep/18 Updated: 29/Oct/23 Resolved: 05/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.8, 4.0.2, 4.1.3 |
| Fix Version/s: | 3.6.11, 4.0.5, 4.1.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | Esha Maharishi (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||
| Sprint: | Sharding 2018-10-22, Sharding 2018-11-05, Sharding 2018-11-19, Sharding 2018-12-17 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 51 | ||||||||||||
| Description |
|
This can trigger a crash or incorrect behavior if the sharding components are accessed before being fully initialized. One example is that the ShardServerCatalogCacheLoader is created in initializeGlobalShardingStateForMongoD before Grid::setShardingInitialized but its ReplicaSetRole member variable is initialized later in initializeShardingEnvironmentOnShardServer after Grid::setShardingInitialized. This can lead to a crash because various methods in the ShardServerCatalogCacheLoader invariant that the ReplicaSetRole will not be ReplicaSetRole::None. An example is shown here (and the corresponding Evergreen task). |
| Comments |
| Comment by Githook User [ 11/Feb/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 10/Dec/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}Message: (cherry picked from commit 26bad42994d1536f4d22aad47b0b537c3c5359b2) | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 05/Dec/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 03/Dec/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
I think the real issue is that the sharding state is set to "initialized" on the Grid before the sharding components have finished being initialized. I can repro the crash by applying the following diff, which makes the logical session cache refresh interval 1 second and adds a 10 second sleep after the call to Grid::setShardingInitialized but before CatalogCacheLoader::initializeReplicaSetRole, on base commit b12875f9e31321c247cc70515d669dd71ffbf700:
and running this repro script:
|