-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.6.6
-
Component/s: Aggregation Framework, Sharding
-
Sharding
-
ALL
-
-
Sharding 2018-10-08, Sharding 2018-11-05
-
(copied to CRM)
ISSUE DESCRIPTION AND IMPACT
After upgrading to MongoDB 3.6 and attempting to initialize a change stream against a sharded collection, users may encounter the following error:
errmsg: 'Collection foo.bar UUID differs from UUID on change stream operations',
This error occurs when at least one shard that owns data for the collection has received an operation for that collection before upgrading to Feature Compatibility Version 3.6. This causes sharding cache entries to become persisted without a UUID. Once a shard's cache reaches this state any subsequent refreshes of the cache will not add a UUID regardless of the Feature Compatibility Version.
DIAGNOSIS AND AFFECTED VERSIONS
This can occur after upgrading a MongoDB sharded cluster to 3.6.x.
The situation can be confirmed by running the following query directly against the shard Primary that encountered the error and checking if it has a UUID associated.
db.getSiblingDB("config").cache.collections.find({_id:<namespace>})
REMEDIATION AND WORKAROUNDS
In order to resolve this issue please perform the following steps:
1. Connect to the shard Primary directly not through the mongos.
mongo --port <shardport>
2. Remove the document in config.cache.collections that matches the problem namespace.
db.getSiblingDB("config").cache.collections.remove({_id:<namespace>}, {writeConcern: {w:"majority"}})
3. Drop the config.cache.chunks collection that matches your namespace. If you are on 4.0, you can pass a write concern of "majority" to the drop statement to ensure it becomes majority-committed before proceeding. If you are on 3.6, run the command without the write concern and check the Secondaries to confirm that the drop has become majority-committed.
db.getSiblingDB("config").cache.chunks.<namespace>.drop({writeConcern: {w:"majority"}})
4. Restart the affected shards by performing a rolling restart.
5. Perform a query that touches all of shards that contain the problem collection.
db.getSiblingDB("<database>").<collection>.find().toArray().length
Original description
This was found as part of the investigation for SERVER-35999, where a user tries to open a change stream against a sharded collection just after upgrading to 3.6. Currently, the setFCV command does attempt to propagate the newly generated UUIDs for existing collections, however the in-memory cache will still be stale.
Change streams will verify that the UUID from the oplog matches the UUID in the CSS, failing if there's a mismatch or if the UUID does not exist. While bouncing the shards is a valid workaround, it would be nice from a usability standpoint if the setFCV flow also forced a refreshed of the in-memory CSS.
- is duplicated by
-
SERVER-35999 A $changeStream can trigger assertion on UUID check when the collection cache is stale
- Closed