-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
Labels:
-
Query Optimization
-
ALL
-
-
(copied to CRM)
When generating a resume token (either to return to the client or to compare against a given token), the change stream stage will consult the CollectionShardingState to determine the document key fields to extract. In most cases, this is perfectly fine and the document key will either be '_id' for an unsharded collection or the shard key + '_id' for sharded collections. However, this assumption breaks down for any of the following reasons:
1. The node has not received any versioned commands, this is especially obvious for secondaries.
2. A node is restarted, clearing its internal sharding state (this is very common in atlas maintenance, for instance). The node now assumes that the collection is unsharded. This is tracked by SERVER-32198.
3. The collection becomes sharded in between the time that the token is generated and the resume is attempted. The document key in the token will only have _id but the resume logic will see the shard key and incorrectly fail.
For the second case above, there are (at least) two possibilities for resume failures if the collection is sharded. Either 1) we're attempting to resume against a node which incorrectly thinks the collection is unsharded or 2) we're attempting to resume on a fresh node but the token was generated from a node with stale metadata. In both situations we will assert since the document key fields do not match.