[SERVER-33356] Ensure shards' persisted collection cache picks up collection UUIDs after setFCV=4.0 Created: 15/Feb/18 Updated: 29/Oct/23 Resolved: 16/Apr/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.7.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Esha Maharishi (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||
| Sprint: | Sharding 2018-04-09, Sharding 2018-04-23 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
FCV 3.4->3.6 has no schema upgrade process for config.cache.collections. config.cache.collections was added to shards as part of the safe secondary reads project, which didn't require an upgrade process, and safely started persisting routing metadata as soon as the v3.6 binary cluster started up. However, UUIDs were added to config.cache.collections to address replication change streams needs. I believe that a config config.collections schema upgrade process was added, but no one thought to add an upgrade process for shards. So, ShardServerCatalogCacheLoader will happily persist config.cache.collections from the start. Then setFCV(3.6) happens, and config.cache.collections won't get the UUID until the shard refreshes NEW chunks for that collection. The ShardServerCatalogCacheLoader won't schedule a persisted update unless new chunks are received from the config server, per this bit of code. That was added to improve secondary availability because every primary persistence task sets flags that block metadata reads, and might even prompt forcing the secondary to refresh if we got around to that (I forget). Consider the in-memory UUID values in the upgrade process. ChunkManager is probably fine because it gets recreated/updated every CatalogCache refresh. Unsure if we have any other in-memory UUIDs fields that must get set on upgrade. |
| Comments |
| Comment by Githook User [ 05/Feb/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: Instead, bump only one chunk per shard to satisfy the reqirements |
| Comment by Esha Maharishi (Inactive) [ 18/Sep/18 ] |
|
Note, this patch committed a test that demonstrated that shards would pick up collection UUIDs the next time the shards refreshed after a setFCV=4.0. However, change streams (which, as of 4.0, are the only consumer of UUIDs in the shards' routing table caches) on sharded collections are unversioned, so they do not cause the shard to refresh. ------- Further, this later commit updated the test to use a findOne, rather than a forced refresh, against the shard to make the shard refresh (since findOne is versioned). However, 1) By the time the later commit went in, setFCV was updating the chunks collection on the config server with bumped chunk versions 2) The router was refreshing and picking up the bumped chunk versions before sending the versioned request to the shard. This is because the router was invalidating its in-memory cache on the shardCollection calls, so the router would pick up the bumped chunk versions on the findOne calls before sending the findOne calls to the shard. Since the router's requests would have the bumped chunk versions, of course the shard would detect a version mismatch and refresh. The test is not demonstrating that a stale router would cause the shard to refresh.
|
| Comment by Githook User [ 16/Apr/18 ] |
|
Author: {'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}Message: |
| Comment by Kaloian Manassiev [ 21/Feb/18 ] |
|
spencer, as part of investigation for this bug it was discovered that there are possible situations in 3.6 where the shard filtering metadata collections will be missing collection UUIDs. We will fix it for FCV 4.0, but backporting it for existing 3.6 deployments is problematic since we don't have the FCV upgrade action to hook it into. I believe the only user of the collection UUIDs in 3.6 is change streams. Do you know what problems can arise if the UUIDs are missing? I would like to gauge how bad it is so we don't have to figure out a 3.6 solution. |