[SERVER-75304] Fix tassert() failure caused by bad interaction between explodeForSort and IntervalEvaluationTrees Created: 27/Mar/23  Updated: 29/Oct/23  Resolved: 31/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Rui Liu Assignee: Rui Liu
Resolution: Fixed Votes: 0
Labels: pm2697-m3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QE 2023-04-03
Participants:
Linked BF Score: 134
Story Points: 2

 Comments   
Comment by Githook User [ 31/Mar/23 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-75304 Fix tassert() failure caused by bad interaction between explodeForSort and IntervalEvaluationTrees
Branch: master
https://github.com/mongodb/mongo/commit/8a0ab094055d03b4bdbe528c5faf5fdfe72c8aee

Comment by David Storch [ 29/Mar/23 ]

I wanted to add a writeup of my understanding of the issue given the lack of detail on the ticket. I started with a repro script provided by rui.liu@mongodb.com which looks something like this:

(function() {
"use strict";
 
db.coll.drop();
assert.commandWorked(db.coll.createIndex({array: -1, num: 1}));
assert.commandWorked(db.coll.insert({array: []}));
let explain = db.coll.find({array: {$all: [1, [2]]}}).sort({num: 1}).explain();
printjson(explain);
}());

The symptom of the bug is that the query unexpectedly fails with the following message:

MongoDB shell version v7.0.0-alpha
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("d786c5d6-298e-4bbf-b1b2-265d988203d3") }
MongoDB server version: 7.0.0-alpha
uncaught exception: Error: explain failed: {
        "ok" : 0,
        "errmsg" : "'constNodePtr' must have the same point interval as the one in 'bounds'",
        "code" : 7426202,
        "codeName" : "Location7426202"
} :
_getErrorWithCode@src/mongo/shell/utils.js:24:13
throwOrReturn@src/mongo/shell/explainable.js:25:19
constructor/this.finish@src/mongo/shell/explain_query.js:111:32
DBQuery.prototype.explain@src/mongo/shell/query.js:489:25
@repro.js:7:70
@repro.js:9:2
failed to load: repro.js

It has triggered tassert() 7426202. This is happening specifically for explodeForSort() plans, so I instrumented the server to print the plan before and after explosion. The logging output I generated for this query looks like this:

!!! storchprint solution before explosion:
FETCH
---filter:
        $and || Selected Index #0 pos 0 combine 1
            array $eq [ 2.0 ] || Selected Index #0 pos 0 combine 1
            array $eq 1.0
---nodeId = 0
---fetched = 1
---sortedByDiskLoc = 0
---providedSorts = {baseSortPattern: {}, ignoredFields: []}
---Child:
------IXSCAN
---------indexName = array_-1_num_1
---------keyPattern = { array: -1.0, num: 1.0 }
---------direction = 1
---------bounds = field #0['array']: [[ 2.0 ], [ 2.0 ]], [2.0, 2.0], field #1['num']: [MinKey, MaxKey]
---------iets = (iets { array: -1.0, num: 1.0 } (array: -1.0 (const [2.0, 2.0] [[ 2.0 ], [ 2.0 ]])) (num: 1.0 (const [MinKey, MaxKey])))
---------nodeId = 0
---------fetched = 0
---------sortedByDiskLoc = 0
---------providedSorts = {baseSortPattern: {}, ignoredFields: []}
 
!!! storchprint solution after explosion:
MERGE_SORT
---nodeId = 0
---fetched = 0
---sortedByDiskLoc = 0
---providedSorts = {baseSortPattern: {}, ignoredFields: []}
---Child 0:
------IXSCAN
---------indexName = array_-1_num_1
---------keyPattern = { array: -1.0, num: 1.0 }
---------direction = 1
---------bounds = field #0['array']: [[ 2.0 ], [ 2.0 ]], field #1['num']: [MinKey, MaxKey]
---------iets = (iets { array: -1.0, num: 1.0 } (array: -1.0 (const [2.0, 2.0])) (num: 1.0 (const [MinKey, MaxKey])))
---------nodeId = 0
---------fetched = 0
---------sortedByDiskLoc = 0
---------providedSorts = {baseSortPattern: {}, ignoredFields: []}
---Child 1:
------IXSCAN
---------indexName = array_-1_num_1
---------keyPattern = { array: -1.0, num: 1.0 }
---------direction = 1
---------bounds = field #0['array']: [2.0, 2.0], field #1['num']: [MinKey, MaxKey]
---------iets = (iets { array: -1.0, num: 1.0 } (array: -1.0 (const [[ 2.0 ], [ 2.0 ]])) (num: 1.0 (const [MinKey, MaxKey])))
---------nodeId = 0
---------fetched = 0
---------sortedByDiskLoc = 0
---------providedSorts = {baseSortPattern: {}, ignoredFields: []}

Let's start by examining the QuerySolution prior to explosion. The interesting observation is that the ordering of the intervals differs between the IETs and the bounds themselves:

---------bounds = field #0['array']: [[ 2.0 ], [ 2.0 ]], [2.0, 2.0], field #1['num']: [MinKey, MaxKey]
---------iets = (iets { array: -1.0, num: 1.0 } (array: -1.0 (const [2.0, 2.0] [[ 2.0 ], [ 2.0 ]])) (num: 1.0 (const [MinKey, MaxKey])))

This happens because the bounds themselves have been "aligned" (with a call to IndexBoundsBuilder::alignBounds()) such that the ordering and direction of the intervals is consistent with whether the index is ascending or descending. In particular, the "array" field of the index is descending, so the order of the intervals has been reversed. For the IETs, however, the bounds get aligned only after the IET is evaluated: see here. Therefore, it is completely correct that the intervals inside the "const" node of the IET are unaligned and therefore have a different order from the index bounds themselves.

The problem occurs when the plan goes through the explodeForSort() path. Since the IETs and bounds themselves order the intervals in opposite directions, explosion leads to two IXSCAN nodes whose IETs and bounds do not consist of the same intervals:

---------bounds = field #0['array']: [2.0, 2.0], field #1['num']: [MinKey, MaxKey]
---------iets = (iets { array: -1.0, num: 1.0 } (array: -1.0 (const [[ 2.0 ], [ 2.0 ]])) (num: 1.0 (const [MinKey, MaxKey])))
...
---------bounds = field #0['array']: [[ 2.0 ], [ 2.0 ]], field #1['num']: [MinKey, MaxKey]
---------iets = (iets { array: -1.0, num: 1.0 } (array: -1.0 (const [2.0, 2.0])) (num: 1.0 (const [MinKey, MaxKey])))

This is a bit subtle – the extra set of square brackets is significant in the examples above, since [[2.0], [2.0]] is a different interval than [2.0, 2.0]. The assertion that is triggered is essentially checking that a constant interval inside the IET is identical to the index bounds themselves, and as observed above this is not the case after explode for sort.

The problem is due to how the current implementation of explodeForSort() manipulates the IETs of the IXSCAN nodes being exploded. rui.liu@mongodb.com is proposing a fix that changes how the IETs are constructed in the case of explosion to avoid this inconsistency between the IETs and the actual bounds.

Generated at Thu Feb 08 06:29:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.