[SERVER-74794] PeriodicShardedIndexConsistencyChecker fails with error "unrecognized field 'let'" Created: 13/Mar/23  Updated: 27/Jul/23  Resolved: 25/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.15
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ian Springer Assignee: Backlog - Query Optimization
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-46341 unrecognized field 'useNewUpsert' Closed
Assigned Teams:
Query Optimization
Operating System: ALL
Steps To Reproduce:

Upgrade the config servers in a sharded 4.4 cluster that has some indexes defined to 5.0.

Participants:

 Description   

We have begin upgrading one of our 4.4.16 clusters to 5.0.15. So far, only the config servers and 2 shard replicas have been upgraded. Since upgrading these nodes to 5.0.15, we see the following logged every 10 minutes:

{"t":{"$date":"2023-03-13T17:33:50.005+00:00"},"s":"I",  "c":"SHARDING", "id":22049,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Checking consistency of sharded collection indexes across the cluster"}
{"t":{"$date":"2023-03-13T17:33:50.162+00:00"},"s":"I",  "c":"SHARDING", "id":22052,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Error while checking sharded index consistency","attr":{"error":{"code":9,"codeName":"FailedToParse","errmsg":"unrecognized field 'let'"}}} 

I enabled debug logging for the "sharding" and "query" components and was able to obtain additional details about the 'let' it's referring to:

 

{"t":{"$date":"2023-03-13T17:43:50.005+00:00"},"s":"I",  "c":"SHARDING", "id":22049,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Checking consistency of sharded collection indexes across the cluster"}
{"t":{"$date":"2023-03-13T17:43:50.005+00:00"},"s":"D5", "c":"QUERY",    "id":20967,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Beginning planning","attr":{"options":"INDEX_INTERSECTION SPLIT_LIMITED_SORT ","query":"ns=config.collectionsTree: $and\nSort: {}\nProj: {}\n"}}
{"t":{"$date":"2023-03-13T17:43:50.005+00:00"},"s":"D5", "c":"QUERY",    "id":20968,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Index number and details","attr":{"indexNumber":0,"index":"kp: { _id: 1 } unique name: '(_id_, )' io: { v: 2, key: { _id: 1 }, name: \"_id_\" }"}}
{"t":{"$date":"2023-03-13T17:43:50.005+00:00"},"s":"D5", "c":"QUERY",    "id":20972,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Rated tree","attr":{"tree":"$and\n"}}
{"t":{"$date":"2023-03-13T17:43:50.005+00:00"},"s":"D5", "c":"QUERY",    "id":20979,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Planner: outputted indexed solutions","attr":{"numSolutions":0}}
{"t":{"$date":"2023-03-13T17:43:50.005+00:00"},"s":"D5", "c":"QUERY",    "id":20984,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Planner: outputting a collection scan","attr":{"collectionScan":"COLLSCAN\n---ns = config.collections\n---filter = $and\n---nodeId = 1\n---fetched = 1\n---sortedByDiskLoc = 0\n---providedSorts = {baseSortPattern: {}, ignoredFields: []}\n"}}
{"t":{"$date":"2023-03-13T17:43:50.005+00:00"},"s":"D5", "c":"QUERY",    "id":5400802, "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"collection scan bounds","attr":{"min":"none","max":"none"}}
{"t":{"$date":"2023-03-13T17:43:50.005+00:00"},"s":"D2", "c":"QUERY",    "id":20926,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Only one plan is available; it will be run but will not be cached","attr":{"query":"ns: config.collections query: {} sort: {} projection: {}","planSummary":"COLLSCAN"}}
{"t":{"$date":"2023-03-13T17:43:50.147+00:00"},"s":"D5", "c":"QUERY",    "id":20725,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Expression prior to rewrite","attr":{"expression":"{$or: [{$gt: [{$size: [\"$missingFromShards\"]}, {$const: 0}]}, {$gt: [{$size: [\"$inconsistentProperties\"]}, {$const: 0}]}]}"}}
{"t":{"$date":"2023-03-13T17:43:50.147+00:00"},"s":"D5", "c":"QUERY",    "id":20967,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Beginning planning","attr":{"options":"NO_TABLE_SCAN ","query":"ns=AAA.ClusterSettingsTree: $and\nSort: {}\nProj: {}\nCollation: { locale: \"simple\" }\n"}}
{"t":{"$date":"2023-03-13T17:43:50.147+00:00"},"s":"D5", "c":"QUERY",    "id":20968,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Index number and details","attr":{"indexNumber":0,"index":"kp: { _id: 1 } name: '(shardkey, )'"}}
{"t":{"$date":"2023-03-13T17:43:50.147+00:00"},"s":"D5", "c":"QUERY",    "id":20972,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Rated tree","attr":{"tree":"$and\n"}}
{"t":{"$date":"2023-03-13T17:43:50.147+00:00"},"s":"D5", "c":"QUERY",    "id":20979,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Planner: outputted indexed solutions","attr":{"numSolutions":0}}
{"t":{"$date":"2023-03-13T17:43:50.147+00:00"},"s":"D1", "c":"QUERY",    "id":20904,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Dispatching command {cmdObj} to establish cursors on shards","attr":{"cmdObj":{"aggregate":"ClusterSettings","pipeline":[{"$indexStats":{}},{"$group":{"_id":{"$const":null},"indexDoc":{"$push":"$$ROOT"},"allShards":{"$addToSet":"$shard"}}},{"$unwind":{"path":"$indexDoc"}},{"$group":{"_id":"$indexDoc.name","shards":{"$push":"$indexDoc.shard"},"specs":{"$push":{"$objectToArray":[{"$ifNull":["$indexDoc.spec",{"$const":{}}]}]}},"allShards":{"$first":"$allShards"}}},{"$project":{"_id":true,"missingFromShards":{"$setDifference":["$allShards","$shards"]},"inconsistentProperties":{"$setDifference":[{"$reduce":{"input":"$specs","initialValue":{"$arrayElemAt":["$specs",{"$const":0}]},"in":{"$setUnion":["$$value","$$this"]}}},{"$reduce":{"input":"$specs","initialValue":{"$arrayElemAt":["$specs",{"$const":0}]},"in":{"$setIntersection":["$$value","$$this"]}}}]}}},{"$match":{"$expr":{"$or":[{"$gt":[{"$size":"$missingFromShards"},0]},{"$gt":[{"$size":"$inconsistentProperties"},0]}]}}},{"$project":{"inconsistentProperties":true,"missingFromShards":true,"indexName":"$$ROOT._id","_id":false}},{"$limit":1}],"cursor":{"batchSize":101},"let":{"NOW":{"$date":"2023-03-13T17:43:50.147Z"},"CLUSTER_TIME":{"$timestamp":{"t":1678729430,"i":159}}},"fromMongos":true,"collation":{"locale":"simple"},"readConcern":{"provenance":"implicitDefault"},"writeConcern":{"w":1,"wtimeout":0}}}}
{"t":{"$date":"2023-03-13T17:43:50.147+00:00"},"s":"D3", "c":"QUERY",    "id":4625502, "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Establishing cursors on remotes","attr":{"opId":21823852,"numRemotes":1,"opKey":{"uuid":{"$uuid":"7f913566-8991-477c-8711-3e8182a2d278"}}}}
{"t":{"$date":"2023-03-13T17:43:50.149+00:00"},"s":"D3", "c":"QUERY",    "id":4674000, "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Experienced a failure while establishing cursors","attr":{"error":{"code":9,"codeName":"FailedToParse","errmsg":"unrecognized field 'let'"}}}
{"t":{"$date":"2023-03-13T17:43:50.149+00:00"},"s":"I",  "c":"QUERY",    "id":4625501, "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Unable to establish remote cursors","attr":{"error":{"code":9,"codeName":"FailedToParse","errmsg":"unrecognized field 'let'"},"nRemotes":0}}
{"t":{"$date":"2023-03-13T17:43:50.163+00:00"},"s":"I",  "c":"SHARDING", "id":22052,   "ctx":"PeriodicShardedIndexConsistencyChecker","msg":"Error while checking sharded index consistency","attr":{"error":{"code":9,"codeName":"FailedToParse","errmsg":"unrecognized field 'let'"}}} 

 

I'm thinking perhaps 'let' is a new option that was added in 5.0. If so, the command should not be attempting to use it until all nodes in the cluster are 5.0 (or maybe check if fCV is 5.0).

 



 Comments   
Comment by Katya Kamenieva [ 25/Jul/23 ]

We decided not to introduce any changes in the current behavior since the error should only be observed temporarily, 4.4 is the only affected version, and fixing this will require many backports. Feel free to reopen the ticket if we misinterpreted the impact of this and it causes significant issues.

Comment by Cris Insignares Cuello [ 21/Jul/23 ]

We believe this has been introduced by SERVER-51649, We will reassign to the Query Optimization team, to figure out how to fix this.

Comment by Eric Sedor [ 08/Jun/23 ]

ian.springer@salesforce.com as you suspect, this is very likely due to the version mismatch between shards. Typically it would only be a concern if an error persisted after a version upgrade was complete, and we don't want to see shards running mixed versions for any longer than the documented upgrade process.

I'm going to pass this ticket to our sharding team with a specific question: should this ticket be de-duplicated against any future work around Sharded cluster node version management?

Generated at Thu Feb 08 06:28:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.