[SERVER-36154] Shard's in-memory CSS is not refreshed after upgrading from 3.4 to 3.6, causing a UUID mismatch on $changeStream operations Created: 16/Jul/18  Updated: 06/Dec/22  Resolved: 06/Feb/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Sharding
Affects Version/s: 3.6.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nicholas Zolnierz Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: ShardingRoughEdges
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-35999 A $changeStream can trigger assertion... Closed
Related
Assigned Teams:
Sharding
Operating System: ALL
Steps To Reproduce:

This is a slight modification to the uuid_propagated_to_shards_on_setFCV_3_6.js test which reproduces the issue:

(function() {
    let st = new ShardingTest({shards: {rs0: {nodes: 1}}, other: {config: 3}});
 
    load('jstests/libs/uuid_util.js');
 
    // Start in fcv=3.4.
    assert.commandWorked(st.s.adminCommand({setFeatureCompatibilityVersion: "3.4"}));
 
    let db1 = "test1";
    assert.commandWorked(st.s.adminCommand({enableSharding: db1}));
    st.ensurePrimaryShard(db1, st.shard0.shardName);
 
    assert.commandWorked(st.s.adminCommand({shardCollection: db1 + ".foo0", key: {_id: 1}}));
 
    jsTest.log("upgrade the cluster to fcv=3.6");
    assert.commandWorked(st.s.adminCommand({setFeatureCompatibilityVersion: "3.6"}));
 
    st.checkUUIDsConsistentAcrossCluster();
    assert.commandWorked(st.shard0.getDB('admin').runCommand({forceRoutingTableRefresh: "test1.foo0"}));
 
    let db = st.s.getDB(db1);
    let cs = db.foo0.watch();
 
    assert.writeOK(db.foo0.insert({_id: 0}));
    assert.writeOK(db.foo0.insert({_id: 1}));
 
    assert.soon(() => cs.hasNext());
    assert.eq(cs.next().operationType, "insert");
})();

Sprint: Sharding 2018-10-08, Sharding 2018-11-05
Participants:
Case:

 Description   
Issue Status as of Mar 11, 2019

ISSUE DESCRIPTION AND IMPACT
After upgrading to MongoDB 3.6 and attempting to initialize a change stream against a sharded collection, users may encounter the following error:

errmsg: 'Collection foo.bar UUID differs from UUID on change stream operations',

This error occurs when at least one shard that owns data for the collection has received an operation for that collection before upgrading to Feature Compatibility Version 3.6. This causes sharding cache entries to become persisted without a UUID. Once a shard's cache reaches this state any subsequent refreshes of the cache will not add a UUID regardless of the Feature Compatibility Version.

DIAGNOSIS AND AFFECTED VERSIONS
This can occur after upgrading a MongoDB sharded cluster to 3.6.x.

The situation can be confirmed by running the following query directly against the shard Primary that encountered the error and checking if it has a UUID associated.

db.getSiblingDB("config").cache.collections.find({_id:<namespace>})

REMEDIATION AND WORKAROUNDS
In order to resolve this issue please perform the following steps:
1. Connect to the shard Primary directly not through the mongos.

mongo --port <shardport>

2. Remove the document in config.cache.collections that matches the problem namespace.

db.getSiblingDB("config").cache.collections.remove({_id:<namespace>}, {writeConcern: {w:"majority"}})

3. Drop the config.cache.chunks collection that matches your namespace. If you are on 4.0, you can pass a write concern of "majority" to the drop statement to ensure it becomes majority-committed before proceeding. If you are on 3.6, run the command without the write concern and check the Secondaries to confirm that the drop has become majority-committed.

db.getSiblingDB("config").cache.chunks.<namespace>.drop({writeConcern: {w:"majority"}})

4. Restart the affected shards by performing a rolling restart.
5. Perform a query that touches all of shards that contain the problem collection.

db.getSiblingDB("<database>").<collection>.find().toArray().length

Original description

This was found as part of the investigation for SERVER-35999, where a user tries to open a change stream against a sharded collection just after upgrading to 3.6. Currently, the setFCV command does attempt to propagate the newly generated UUIDs for existing collections, however the in-memory cache will still be stale. 

Change streams will verify that the UUID from the oplog matches the UUID in the CSS, failing if there's a mismatch or if the UUID does not exist.  While bouncing the shards is a valid workaround, it would be nice from a usability standpoint if the setFCV flow also forced a refreshed of the in-memory CSS. 



 Comments   
Comment by Sheeri Cabral (Inactive) [ 06/Feb/20 ]

3.6 is the minimum supported version so bugs in upgrading from 3.4 shouldn't be an issue.

Comment by Esha Maharishi (Inactive) [ 23/Apr/19 ]

Hey alex.komyagin, I thought about your suggestion.

I don't think just 1) dropping the persisted cache and 2) moving a chunk off the problem shard would cause the problem shard's filtering cache to pick up the UUID on the next cache refresh, because if the shard's routing cache previously had info for the collection, a refresh will only pass changed chunks since the last refresh (not the UUID) to the shard's caches. (Otherwise, the refresh passes a full new set of metadata including the UUID returned by the refresh to the caches).

If we could clear the shard's routing cache after the moveChunk but before the shard's next routing cache refresh, we could theoretically trick the refresh into passing the full new set of metadata to the caches. However, given that the shard will refresh its routing cache at the beginning and end of moveChunk, I'm not sure how we could practically achieve this. Therefore, I recommend we just follow the tested solution. Sorry about that. :/

(Edited to fix a link)

Comment by Alexander Komyagin [ 19/Apr/19 ]

Thanks esha.maharishi, your comment above does seem to indicate that a shardVersion command (e.g. after a moveChunk) would trigger the cache refresh though..

A third option is to make change streams attach some kind of dummy shardVersion that doesn't actually trigger a full shardVersion check, but just checks that the shard has the UUID cached (and makes the shard refresh if the shard doesn't).

Comment by Esha Maharishi (Inactive) [ 19/Apr/19 ]

alex.komyagin, no, unfortunately restarting a node is currently the only way to clear a shard's "filtering" routing table cache. (The flushRouterConfig command can be used to clear the shard's "routing" routing table cache.)

Comment by Alexander Komyagin [ 19/Apr/19 ]

Is there a workaround that doesn't require restarting nodes?

Comment by Esha Maharishi (Inactive) [ 24/Jan/19 ]

We might be able to something like, the shard refreshes its cache for every collection on setFCV..that is potentially a lot of calls to the config server (O(numCollections * numShards)), though.

Or we could make the config server "push" the UUIDs to the shards... the entire 3.4 -> 3.6 UUID upgrade was designed to avoid that, though.

A third option is to make change streams attach some kind of dummy shardVersion that doesn't actually trigger a full shardVersion check, but just checks that the shard has the UUID cached (and makes the shard refresh if the shard doesn't).

Comment by Esha Maharishi (Inactive) [ 24/Jan/19 ]

kaloian.manassiev, based on comments like this one on SERVER-35379, I think this bug makes it difficult for people who want to use change streams to upgrade from 3.4 to 3.6.

I think calling markNotShardedAtStepdown wouldn't work because, as nicholas.zolnierz mentioned on SERVER-35999, change streams do not attach shardVersion, so would not trigger the shard to refresh even if the shard's cache had been cleared.

Comment by Kaloian Manassiev [ 24/Jan/19 ]

esha.maharishi, this came up during 4.1 Required tickets triage and I couldn't gauge what is the customer-visible effect of this bug, so I am assigning it to you.

Can you please write a brief explanation of what is the customer-facing side-effect of this bug, whether the fix would be to just call markNotShardedAtStepdown on FCV upgrade/downgrade and how easy would that be in 3.6 (I remember there were problems with doing that in 4.0, because of the locks which need to be held) - perhaps we can do it under the CollectionShardingStateMap's mutex for all collections?

Generated at Thu Feb 08 04:42:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.