[SERVER-56865] explodeForSort logic potentially suboptimal when multikey field(s) follow sort fields in the index Created: 11/May/21  Updated: 31/Oct/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: 4.2.14, 4.4.6
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Chris Harris Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Consider the query:

db.foo.find({e:{$in:[1,2]}, r:{$gt:0}}).sort({s:1})

If following the ESR Guidance, one would create the following index:

db.foo.createIndex({e:1, s:1, r:1}) 

This works fine and produces a SORT_MERGE plan when the index is not multikey:

> db.foo.find({e:{$in:[1,2]}, r:{$gt:0}}).sort({s:1}).explain().queryPlanner.winningPlan.queryPlan
{
	"stage" : "FETCH",
	"planNodeId" : 4,
	"inputStage" : {
		"stage" : "SORT_MERGE",
		"planNodeId" : 3,
		"sortPattern" : {
			"s" : 1
		},
		"inputStages" : [
			{
				"stage" : "IXSCAN",
				"planNodeId" : 1,
				"keyPattern" : {
					"e" : 1,
					"s" : 1,
					"r" : 1
				},
				"indexName" : "e_1_s_1_r_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"e" : [ ],
					"s" : [ ],
					"r" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"e" : [
						"[1.0, 1.0]"
					],
					"s" : [
						"[MinKey, MaxKey]"
					],
					"r" : [
						"(0.0, inf.0]"
					]
				}
			},
			{
				"stage" : "IXSCAN",
				"planNodeId" : 2,
				"keyPattern" : {
					"e" : 1,
					"s" : 1,
					"r" : 1
				},
				"indexName" : "e_1_s_1_r_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"e" : [ ],
					"s" : [ ],
					"r" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"e" : [
						"[2.0, 2.0]"
					],
					"s" : [
						"[MinKey, MaxKey]"
					],
					"r" : [
						"(0.0, inf.0]"
					]
				}
			}
		]
	}
} 

Making the index multikey on the trailing range predicate converts the plan into one that does NOT utilize the SORT_MERGE stage and instead performs a blocking SORT:

> db.foo.insert({r:[1,2,3]})
WriteResult({ "nInserted" : 1 })
> db.foo.find({e:{$in:[1,2]}, r:{$gt:0}}).sort({s:1}).explain().queryPlanner.winningPlan.queryPlan
{
	"stage" : "FETCH",
	"planNodeId" : 3,
	"inputStage" : {
		"stage" : "SORT",
		"planNodeId" : 2,
		"sortPattern" : {
			"s" : 1
		},
		"memLimit" : 104857600,
		"type" : "default",
		"inputStage" : {
			"stage" : "IXSCAN",
			"planNodeId" : 1,
			"keyPattern" : {
				"e" : 1,
				"s" : 1,
				"r" : 1
			},
			"indexName" : "e_1_s_1_r_1",
			"isMultiKey" : true,
			"multiKeyPaths" : {
				"e" : [ ],
				"s" : [ ],
				"r" : [
					"r"
				]
			},
			"isUnique" : false,
			"isSparse" : false,
			"isPartial" : false,
			"indexVersion" : 2,
			"direction" : "forward",
			"indexBounds" : {
				"e" : [
					"[1.0, 1.0]",
					"[2.0, 2.0]"
				],
				"s" : [
					"[MinKey, MaxKey]"
				],
				"r" : [
					"(0.0, inf.0]"
				]
			}
		}
	}
} 

This behavior does not occur when the array/multikey field is one of the prefixed equality conditions:

> db.foo.remove({})
WriteResult({ "nRemoved" : 1 })
> db.foo.reIndex()
{
	...
	"ok" : 1
} 
> db.foo.insert({e:[1,2,3]})
WriteResult({ "nInserted" : 1 })
> db.foo.find({e:{$in:[1,2]}, r:{$gt:0}}).sort({s:1}).explain().queryPlanner.winningPlan.queryPlan
{
	"stage" : "FETCH",
	"planNodeId" : 4,
	"inputStage" : {
		"stage" : "SORT_MERGE",
		"planNodeId" : 3,
		"sortPattern" : {
			"s" : 1
		},
		"inputStages" : [
			{
				"stage" : "IXSCAN",
				"planNodeId" : 1,
				"keyPattern" : {
					"e" : 1,
					"s" : 1,
					"r" : 1
				},
				"indexName" : "e_1_s_1_r_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"e" : [
						"e"
					],
					"s" : [ ],
					"r" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"e" : [
						"[1.0, 1.0]"
					],
					"s" : [
						"[MinKey, MaxKey]"
					],
					"r" : [
						"(0.0, inf.0]"
					]
				}
			},
			{
				"stage" : "IXSCAN",
				"planNodeId" : 2,
				"keyPattern" : {
					"e" : 1,
					"s" : 1,
					"r" : 1
				},
				"indexName" : "e_1_s_1_r_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"e" : [
						"e"
					],
					"s" : [ ],
					"r" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"e" : [
						"[2.0, 2.0]"
					],
					"s" : [
						"[MinKey, MaxKey]"
					],
					"r" : [
						"(0.0, inf.0]"
					]
				}
			}
		]
	}
} 

The problem appears to be related to how we examine the index key patterns to determine if they provide the desired functionality.  More specifically, it looks like there is an assumption made when identifying portion of the index that provides the sort and checking it for multikeyness.  The associated comments in the code say:

// The rest of the fields define the sort order we could obtain by exploding
// the bounds.
 
    // One of the indexed fields providing the sort is multikey. It is not correct for a
    // field with multikey components to provide a sort, so bail out.

The assumption that all of the remaining fields are relevant to the sort is not necessarily correct.  In the example above, it is only the s field in the index that is associated with the sort.  The r field after that is causing the code to exit as it is a multikey field, but since it is NOT part of the sort the fact that it is an array should not violate any constraints on sorting.

Investigated using "version" : "4.5.0-7162-g7581854".



 Comments   
Comment by Diego Rodriguez (Inactive) [ 01/Jul/21 ]

I have been able to reproduce this in the latest v4.2 and v4.4 builds: 4.2.14 and 4.4.6 respectively.

Sample Data:

db.getSiblingDB("bruce-server").getCollection("_User").insertMany([
    {
        "masterStatus": "lead",
        "role": "client",
        "_updated_at": 1,
        "_rperm": [1, 2, 3]
    },
    {
        "masterStatus": "qualified",
        "role": "client",
        "_updated_at": 3,
        "_rperm": [1, 2, 3]
    },
    {
        "masterStatus": "unqualified",
        "role": "client",
        "_updated_at": 2,
        "_rperm": [1, 2, 3]
    },
    {
        "masterStatus": "other",
        "role": "client",
        "_updated_at": 1,
        "_rperm": [1, 2, 3]
    },
    {
        "masterStatus": "lead",
        "role": "server",
        "_updated_at": 1,
        "_rperm": [1, 2, 3]
    }
])

Query:

db.getSiblingDB("bruce-server").getCollection("_User").find(
    {
        "masterStatus": {
            "$in": [
                "lead",
                "unqualified",
                "qualified"
            ]
        },
        "role": "client"
    },
    {
        "_p_accounting": 1,
        "firstName": 1,
        "lastName": 1,
        "masterStatus": 1,
        "_id": 1,
        "_created_at": 1,
        "_updated_at": 1,
        "_rperm": 1,
        "_wperm": 1
    }).sort({"_updated_at": -1}).limit(1000)

Tested Indexes:

db.getSiblingDB("bruce-server").getCollection("_User").createIndex({role: 1, masterStatus: 1, _updated_at: 1, _rperm: 1})
db.getSiblingDB("bruce-server").getCollection("_User").createIndex({role: 1, masterStatus: 1, _updated_at: 1, otherKey: 1})
db.getSiblingDB("bruce-server").getCollection("_User").createIndex({role: 1, masterStatus: 1, _updated_at: 1})

Out of the three indexes only the last two produce a SORT_MERGE plan. The first one, which includes a multi key path on _rperm, produces a blocking sort.

Regards
Diego

Comment by Chris Harris [ 12/May/21 ]

It occurred to me after writing the description text that the problem is independent of the query actually using the trailing key(s) from the index:

> db.foo.find({e:{$in:[1,2]}}).sort({s:1}).explain().queryPlanner.winningPlan
{
	"stage" : "FETCH",
	"inputStage" : {
		"stage" : "SORT",
		"sortPattern" : {
			"s" : 1
		},
		"inputStage" : {
			"stage" : "SORT_KEY_GENERATOR",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"e" : 1,
					"s" : 1,
					"r" : 1
				},
				"indexName" : "e_1_s_1_r_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"e" : [ ],
					"s" : [ ],
					"r" : [
						"r"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"e" : [
						"[1.0, 1.0]",
						"[2.0, 2.0]"
					],
					"s" : [
						"[MinKey, MaxKey]"
					],
					"r" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		}
	}
} 

The only requirement is that there are fields with multikey paths present after the (full) sort definition.  I'll update the title of the ticket accordingly, but leaving the description of the case as-is for now since it describes a particular example of when the problem might be encountered.  

Generated at Thu Feb 08 05:40:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.