[SERVER-76037] Write commands using updateOne without shard key path fails spuriously when predicate refers to 'let' parameter Created: 12/Apr/23  Updated: 29/Oct/23  Resolved: 22/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Jason Zhang
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-71636 Explain executionStats does not work ... Closed
is related to SERVER-75356 explain command for a find with $expr... Closed
is related to SERVER-81057 Writes without shard key doesn't work... Closed
Assigned Teams:
Sharding NYC
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Repro script:

(function() {
"use strict";
 
const coll = db.test_coll;
coll.drop();
 
const testDoc = {
    _id: 4,
    Species: "Song Thrush (Turdus philomelos)",
};
assert.commandWorked(coll.insert(testDoc));
 
assert.commandWorked(db.runCommand({
    delete: coll.getName(),
    let : {target_species: "Song Thrush (Turdus philomelos)"},
    deletes: [{q: {$and: [{_id: 4}, {$expr: {$eq: ["$Species", "$$target_species"]}}]}, limit: 1}]
}));
}());

This only repros the problem if run in the sharded collection passthrough with the feature flag enabled. I'm running it with the following resmoke.py invocation:

python3 buildscripts/resmoke.py run --runAllFeatureFlagTests --suites=sharded_collections_jscore_passthrough repro.js

Sprint: Sharding NYC 2023-05-01
Participants:

 Description   

Recently we've seen a few problems where query parsing in sharded contexts can fail because we are failing to plumb through the let parameters and runtimeConstants: see SERVER-75356 and SERVER-71636. This is a similar bug, but it happens for a delete command only when only the following conditions hold:

  • featureFlagUpdateOneWithoutShardKey is enabled
  • The collection is sharded
  • The delete command does not specify the shard key
  • The delete command's predicate refers to a let variable

In this case, we use a code path which is guarded behind featureFlagUpdateOneWithoutShardKey. We call extractShardKeyFromBasicQuery(), passing through the predicate but not the let parameters or runtime constants. This function then attempts to parse the predicate without providing the let parameters or runtime constants on the ExpressionContext, causing parsing to fail with an error like the following:

2023-04-12T21:42:50.853Z assert: command failed: {
    "ok" : 0,
    "errmsg" : "Use of undefined variable: target_species",
    "code" : 17276,
    "codeName" : "Location17276",
    "$clusterTime" : {
            "clusterTime" : Timestamp(1681335770, 82),
            "signature" : {
                    "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                    "keyId" : NumberLong(0)
            }
    },
    "operationTime" : Timestamp(1681335770, 82)
} with original command request: {
    "delete" : "test_coll",
    "let" : {
            "target_species" : "Song Thrush (Turdus philomelos)"
    },
    "deletes" : [
            {
                    "q" : {
                            "$and" : [
                                    {
                                            "_id" : 4
                                    },
                                    {
                                            "$expr" : {
                                                    "$eq" : [
                                                            "$Species",
                                                            "$$target_species"
                                                    ]
                                            }
                                    }
                            ]
                    },
                    "limit" : 1
            }
    ],
    "lsid" : {
            "id" : UUID("00eb8ffa-8e7b-45a6-980c-a43e5ec06874")
    },
    "$clusterTime" : {
            "clusterTime" : Timestamp(1681335770, 82),
            "signature" : {
                    "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                    "keyId" : NumberLong(0)
            }
    }
} on connection: connection to localhost:20003

I've provided complete repro instructions in the "Steps to Reproduce" section. This bug does not exist in the release configuration of the server, so it seems like we should fix it as part of the "updateOne without shard key" project which I believe covers delete as well.

The solution should be similar to that from SERVER-75356. Namely, we should thread the let parameters and runtime constants through to the ExpressionContext used to parse the query.



 Comments   
Comment by Githook User [ 22/Apr/23 ]

Author:

{'name': 'Jason Zhang', 'email': 'jason.zhang@mongodb.com', 'username': 'jz1242'}

Message: SERVER-76037 Include 'let' and 'legacyRuntimeConstants' when parsing an updateOne/deleteOne/findAndModify without shard key query
Branch: master
https://github.com/mongodb/mongo/commit/5a018a8b6964fa4cf16260d7a940ed7fcc82e654

Comment by David Storch [ 13/Apr/23 ]

jason.zhang@mongodb.com I left a couple of TODOs that need to be handled as part of this ticket:

Comment by David Storch [ 13/Apr/23 ]

I'm generalizing the title, since I think the bug applies to all write commands, not just deletes.

Generated at Thu Feb 08 06:31:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.