Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Gone away
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Aggregation Framework
Labels:
- qopt-team

Assigned Teams:

Query Optimization
Operating System:
ALL

Steps To Reproduce:

Hide

(function() {
    "use strict";

    var st = new ShardingTest({
        shards: 2,
        rs: {nodes: 2, setParameter: {periodicNoopIntervalSecs: 1, writePeriodicNoops: true}}
    });

    var testDB = st.s.getDB('test');
    testDB.dropDatabase();

    assert.commandWorked(testDB.adminCommand({enableSharding: 'test'}));
    st.ensurePrimaryShard('test', st.shard0.shardName);
    st.shardColl(testDB.foo, {x: 1}, {x: 10}, {x: 10});

    let stream = testDB.foo.aggregate([{$changeStream: {}}], {$readPreference: {mode: "secondary"}});
    assert.writeOK(testDB.foo.insert({x: 0}, {writeConcern: {w: "majority"}}));

    // This will trigger a refresh on the secondaries and they will now be aware that the collection
    // is sharded.
    assert.commandWorked(testDB.foo.runCommand({
        count: "foo",
        query: {},
        $readPreference: {mode: "secondary"},
        readConcern: {"level": "local"}
    }));

    assert.soon(()=> stream.hasNext());

    let token = stream.next()._id;

    // Restart the secondary, resetting it's sharding state to be unsharded.
    st.rs0.restart(st.rs0.getSecondary());

    let resumeStream = testDB.foo.aggregate([{$changeStream: {resumeAfter: token}}], 
                             {$readPreference: {mode: "secondary"}});

    assert.writeOK(testDB.foo.insert({x: 1}));    // THIS WILL FAIL TO RESUME
    assert.soon(()=> resumeStream.hasNext());

    st.stop();
}());

Show

Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When generating a resume token (either to return to the client or to compare against a given token), the change stream stage will consult the CollectionShardingState to determine the document key fields to extract. In most cases, this is perfectly fine and the document key will either be '_id' for an unsharded collection or the shard key + '_id' for sharded collections. However, this assumption breaks down for any of the following reasons:

1. The node has not received any versioned commands, this is especially obvious for secondaries.
2. A node is restarted, clearing its internal sharding state (this is very common in atlas maintenance, for instance). The node now assumes that the collection is unsharded. This is tracked by ~~SERVER-32198~~.
3. The collection becomes sharded in between the time that the token is generated and the resume is attempted. The document key in the token will only have _id but the resume logic will see the shard key and incorrectly fail.

For the second case above, there are (at least) two possibilities for resume failures if the collection is sharded. Either 1) we're attempting to resume against a node which incorrectly thinks the collection is unsharded or 2) we're attempting to resume on a fresh node but the token was generated from a node with stale metadata. In both situations we will assert since the document key fields do not match.

Assignee:: [DO NOT USE] Backlog - Query Optimization
Reporter:: Nicholas Zolnierz
Participants:: [DO NOT USE] Backlog - Query Optimization, James Wahlin, Nicholas Zolnierz
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Nov 22 2019 09:06:08 PM UTC
Updated:: Oct 27 2023 08:42:31 PM UTC
Resolved:: Jul 21 2022 02:04:48 PM UTC

Details

Description

Attachments

Activity

People

Dates