[SERVER-33356] Ensure shards' persisted collection cache picks up collection UUIDs after setFCV=4.0 Created: 15/Feb/18  Updated: 29/Oct/23  Resolved: 16/Apr/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.7.4

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-33783 Make shards and mongos do full routin... Closed
Related
related to SERVER-33401 Refuse to start up v4.0 shards if con... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6
Sprint: Sharding 2018-04-09, Sharding 2018-04-23
Participants:

 Description   

FCV 3.4->3.6 has no schema upgrade process for config.cache.collections.

config.cache.collections was added to shards as part of the safe secondary reads project, which didn't require an upgrade process, and safely started persisting routing metadata as soon as the v3.6 binary cluster started up. However, UUIDs were added to config.cache.collections to address replication change streams needs. I believe that a config config.collections schema upgrade process was added, but no one thought to add an upgrade process for shards.

So, ShardServerCatalogCacheLoader will happily persist config.cache.collections from the start. Then setFCV(3.6) happens, and config.cache.collections won't get the UUID until the shard refreshes NEW chunks for that collection.

The ShardServerCatalogCacheLoader won't schedule a persisted update unless new chunks are received from the config server, per this bit of code. That was added to improve secondary availability because every primary persistence task sets flags that block metadata reads, and might even prompt forcing the secondary to refresh if we got around to that (I forget).

Consider the in-memory UUID values in the upgrade process. ChunkManager is probably fine because it gets recreated/updated every CatalogCache refresh. Unsure if we have any other in-memory UUIDs fields that must get set on upgrade.



 Comments   
Comment by Githook User [ 05/Feb/21 ]

Author:

{'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}

Message: SERVER-53274 Avoid bumping all chunk versions after writing 'history' field on FCV upgrade to 4.0.

Instead, bump only one chunk per shard to satisfy the reqirements
imposed by SERVER-33356
Branch: v4.0
https://github.com/mongodb/mongo/commit/52d6f11c459b8d3666379431a6accf7fef4e852f

Comment by Esha Maharishi (Inactive) [ 18/Sep/18 ]

Note, this patch committed a test that demonstrated that shards would pick up collection UUIDs the next time the shards refreshed after a setFCV=4.0.

However, change streams (which, as of 4.0, are the only consumer of UUIDs in the shards' routing table caches) on sharded collections are unversioned, so they do not cause the shard to refresh.

-------

Further, this later commit updated the test to use a findOne, rather than a forced refresh, against the shard to make the shard refresh (since findOne is versioned).

However,

1) By the time the later commit went in, setFCV was updating the chunks collection on the config server with bumped chunk versions

2) The router was refreshing and picking up the bumped chunk versions before sending the versioned request to the shard. This is because the router was invalidating its in-memory cache on the shardCollection calls, so the router would pick up the bumped chunk versions on the findOne calls before sending the findOne calls to the shard. Since the router's requests would have the bumped chunk versions, of course the shard would detect a version mismatch and refresh.

The test is not demonstrating that a stale router would cause the shard to refresh.

 

Comment by Githook User [ 16/Apr/18 ]

Author:

{'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}

Message: SERVER-33356 Ensure shards' persisted collection cache picks up collection UUIDs after setFCV=4.0
Branch: master
https://github.com/mongodb/mongo/commit/c02574298a711b6de8a3d89cedcfe98040a6f55b

Comment by Kaloian Manassiev [ 21/Feb/18 ]

spencer, as part of investigation for this bug it was discovered that there are possible situations in 3.6 where the shard filtering metadata collections will be missing collection UUIDs. We will fix it for FCV 4.0, but backporting it for existing 3.6 deployments is problematic since we don't have the FCV upgrade action to hook it into.

I believe the only user of the collection UUIDs in 3.6 is change streams. Do you know what problems can arise if the UUIDs are missing? I would like to gauge how bad it is so we don't have to figure out a 3.6 solution.

Generated at Thu Feb 08 04:33:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.