[SERVER-29347] Why is MongoDB treating queries differently when served from same index? Created: 24/May/17  Updated: 12/Jul/17  Resolved: 09/Jun/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.4.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Michal Bigos Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-29346 Why is MongoDB not using index correc... Closed
Participants:

 Description   

I have a collection where my documents looks like:

    {
      "_id" : ObjectId("591dbe4a77d4ede22d765250"),
      "name" : [
        {
          "de" : true,
          "text" : "Brunhilde"
        },
        {
          "sk" : true,
          "text" : "Šimon"
        }
      ]
    }

I have defined an index as:

> db.names.createIndex({ 'name.de': 1, 'name.text': 1 }, {  name: 'name_de', partialFilterExpression: { 'name.de': { $exists: true } }, collation: { locale: 'de' } });

When I do a query like:

> db.names.find({ 'name.de': true, 'name.text': 'Rüdi' }).collation({ locale: 'de' });    

The explain plan looks like:

"winningPlan" : {
      "stage" : "FETCH",
      "filter" : {
        "name.text" : {
          "$eq" : "Rüdi"
        }
      },
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "name.de" : 1,
          "name.text" : 1
        },
        "indexName" : "name_de",
        "collation" : {
          "locale" : "de",
          "caseLevel" : false,
          "caseFirst" : "off",
          "strength" : 3,
          "numericOrdering" : false,
          "alternate" : "non-ignorable",
          "maxVariable" : "punct",
          "normalization" : false,
          "backwards" : false,
          "version" : "57.1"
        },
        "isMultiKey" : true,
        "multiKeyPaths" : {
          "name.de" : [
            "name"
          ],
          "name.text" : [
            "name"
          ]
        },
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : true,
        "indexVersion" : 2,
        "direction" : "forward",
        "indexBounds" : {
          "name.de" : [
            "[true, true]"
          ],
          "name.text" : [
            "[MinKey, MaxKey]"
          ]
        }
      }
    }

It does IXSCAN followed by FETCH stage with filter. I've already created an question about the filter here.
The more interesting is what will happen when I just change the matching part of the query to:

> db.names.find({ 'name.de': { $exists: true }, 'name.text': 'Rüdi' }).collation({ locale: 'de' });

i.e. expression `'name.de': { $exists: true }` should be still subset of `partialFilterExpression`. As stated in documentation:

> To use the partial index, a query must contain the filter expression (or a modified filter expression that specifies a subset of the filter expression) as part of its query condition.

But the explain plan looks like this:

    ...
    "winningPlan" : {
      "stage" : "FETCH",
      "filter" : {
        "$and" : [
          {
            "name.de" : {
              "$exists" : true
            }
          },
          {
            "name.text" : {
              "$eq" : "Rüdi"
            }
          }
        ]
      },
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "name.de" : 1,
          "name.text" : 1
        },
        "indexName" : "name_de",
        "collation" : {
          "locale" : "de",
          "caseLevel" : false,
          "caseFirst" : "off",
          "strength" : 3,
          "numericOrdering" : false,
          "alternate" : "non-ignorable",
          "maxVariable" : "punct",
          "normalization" : false,
          "backwards" : false,
          "version" : "57.1"
        },
        "isMultiKey" : true,
        "multiKeyPaths" : {
          "name.de" : [
            "name"
          ],
          "name.text" : [
            "name"
          ]
        },
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : true,
        "indexVersion" : 2,
        "direction" : "forward",
        "indexBounds" : {
          "name.de" : [
            "[MinKey, MaxKey]"
          ],
          "name.text" : [
            "[MinKey, MaxKey]"
          ]
        }
      }
    }  
    ...

As you can see index is used, but the whole filtering is happening in FETCH stage.

Question is: *why the filtering is done in FETCH stage and what is so different between these 2 queries that MongoDB them differently?*

Additionaly, sort query with `$exists` as:

> db.names.find({ 'name.de': { $exists: true } }).sort({ 'name.text': 1 }).collation({ locale: "de" })

Behaves the same, whole filtering and sorting is done after IXSCAN stage:

    ...
    "winningPlan" : {
          "stage" : "SORT",
          "sortPattern" : {
            "name.text" : 1
          },
          "inputStage" : {
            "stage" : "SORT_KEY_GENERATOR",
            "inputStage" : {
              "stage" : "FETCH",
              "filter" : {
                "name.de" : {
                  "$exists" : true
                }
              },
              "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                  "name.de" : 1,
                  "name.text" : 1
                }
    ...

It even produces the incorrect results, while index is not used for sorting.



 Comments   
Comment by Mark Agarunov [ 09/Jun/17 ]

Hello michal bigos

Thank you for the response. The behavior you are seeing is likely due to the issues detailed in SERVER-23808 and SERVER-26580. Please watch those tickets for updates on this issue.

Thanks,
Mark

Comment by Michal Bigos [ 29/May/17 ]

In relation to, already closed issue, with the same meaning, but different model (see https://jira.mongodb.org/browse/SERVER-29346). I'm tending to use the model:

    > db.names.findOne();
    {
      "_id" : ObjectId("591dbcd877d4ede22d765247"),
      "name" : [
        {
          "locale" : "de",
          "text" : "Brunhilde"
        },
        {
          "locale" : "sk",
          "text" : "Šimon"
        }
      ]
    }

Here I can define index like this:

    > db.names.createIndex({ 'name.locale': 1, 'name.text': 1 }, { name: 'name_de', partialFilterExpression: { 'name.locale': 'de' }, collation: { locale: 'de' } });

The query defined as:

    > db.names.find({ name: { $elemMatch: { locale: 'de', text: 'Rüdi' } } }).collation({ locale: 'de' }).explain()

makes COLLSCAN, which is caused by missing 'name.locale': 'de' in filter, thus not matching the partialFilterExpression.
*But again why is the query planner not treating part of { $elemMatch: { locale: 'de', ... }} as equivalent to 'name.locale': 'de' ?*

Comment by Michal Bigos [ 29/May/17 ]

First of all thanks, yes you're right `$elemMatch` is what I want. I also undestand the `$exists` constraint as legit.
But still why is not `$exists` condition eliminated by query optimizer when the MongoDB qalready knows that the results will be served from index, by matching the `partialFilterExpression`. What I mean is that when I write queryl like:

    > db.names.explain().find({name: {$elemMatch: {de: true, text: 'Rüdi'}}, "name.de": {$exists: true}}).collation({locale: 'de'});

MongoDB already knows when searching for matching index that `"name.de": {$exists: true}` part of the filter matches `partialFilterExpression` and thus everything which comes from this matching index already fullfils the condition, thus condition can be eliminated and query served as:

    > db.names.explain().find({name: {$elemMatch: {de: true, text: 'Rüdi'}}}).collation({locale: 'de'});

i.e. from selected index and without filter for `$exists`.

I'm not sure how `$elemMatch` is served internally, but it seems to remain in FETCH stage filter. *What is the reason for that?*
As I understood there is currently no possibility to use `$elemMatch` in `partialFilterExpression`. I'm getting error, when trying to define index as:

    > db.names.createIndex({ 'name.de': 1, 'name.text': 1 }, {  name: 'name_de', partialFilterExpression: { 'name': { $elemMatch: { 'de': true } }}, collation: { locale: 'de' } });
    {
      "ok" : 0,
      "errmsg" : "unsupported expression in partial index: name $elemMatch (obj)\n    de == true\n",
      "code" : 67,
      "codeName" : "CannotCreateIndex"
    }

*Is there any way how to work around this?*

Comment by David Storch [ 24/May/17 ]

You may get the behavior you want if you use $elemMatch:

> db.names.explain().find({name: {$elemMatch: {de: true, text: 'Rüdi'}}, "name.de": {$exists: true}}).collation({locale: 'de'});
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.names",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$elemMatch" : {
							"$and" : [
								{
									"de" : {
										"$eq" : true
									}
								},
								{
									"text" : {
										"$eq" : "Rüdi"
									}
								}
							]
						}
					}
				},
				{
					"name.de" : {
						"$exists" : true
					}
				}
			]
		},
		"collation" : {
			"locale" : "de",
			"caseLevel" : false,
			"caseFirst" : "off",
			"strength" : 3,
			"numericOrdering" : false,
			"alternate" : "non-ignorable",
			"maxVariable" : "punct",
			"normalization" : false,
			"backwards" : false,
			"version" : "57.1"
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"name" : {
							"$elemMatch" : {
								"$and" : [
									{
										"de" : {
											"$eq" : true
										}
									},
									{
										"text" : {
											"$eq" : "Rüdi"
										}
									}
								]
							}
						}
					},
					{
						"name.de" : {
							"$exists" : true
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"name.de" : 1,
					"name.text" : 1
				},
				"indexName" : "name_de",
				"collation" : {
					"locale" : "de",
					"caseLevel" : false,
					"caseFirst" : "off",
					"strength" : 3,
					"numericOrdering" : false,
					"alternate" : "non-ignorable",
					"maxVariable" : "punct",
					"normalization" : false,
					"backwards" : false,
					"version" : "57.1"
				},
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"name.de" : [
						"name"
					],
					"name.text" : [
						"name"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"name.de" : [
						"[true, true]"
					],
					"name.text" : [
						"[\"KQ/9\u0001D�\u0006\u0001�\b\", \"KQ/9\u0001D�\u0006\u0001�\b\"]"
					]
				}
			}
		},
		"rejectedPlans" : [
			{
				"stage" : "FETCH",
				"filter" : {
					"$and" : [
						{
							"name.de" : {
								"$exists" : true
							}
						},
						{
							"name" : {
								"$elemMatch" : {
									"$and" : [
										{
											"de" : {
												"$eq" : true
											}
										},
										{
											"text" : {
												"$eq" : "Rüdi"
											}
										}
									]
								}
							}
						}
					]
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"name.de" : 1,
						"name.text" : 1
					},
					"indexName" : "name_de",
					"collation" : {
						"locale" : "de",
						"caseLevel" : false,
						"caseFirst" : "off",
						"strength" : 3,
						"numericOrdering" : false,
						"alternate" : "non-ignorable",
						"maxVariable" : "punct",
						"normalization" : false,
						"backwards" : false,
						"version" : "57.1"
					},
					"isMultiKey" : true,
					"multiKeyPaths" : {
						"name.de" : [
							"name"
						],
						"name.text" : [
							"name"
						]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : true,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"name.de" : [
							"[MinKey, MaxKey]"
						],
						"name.text" : [
							"[MinKey, MaxKey]"
						]
					}
				}
			}
		]
	},
	"serverInfo" : {
		"host" : "dstorch",
		"port" : 27017,
		"version" : "3.4.4",
		"gitVersion" : "888390515874a9debd1b6c5d36559ca86b44babd"
	},
	"ok" : 1
}

Note that this changes the semantics of the query such that the same subdocument in the names array must simultaneously match both the conditions on name.de and name.text. Looking at your schema, it looks like this might be the intended meaning of the query?

Also note that the bounds associated with an {$exists: true} predicate will always be [MinKey, MaxKey]. Furthermore, we always must fetch the documents and apply the {$exists: true}. This is because literal null values and missing values both generate the same index keys, so we must fetch the document in order to distinguish between these two cases. (Literal nulls match the {$exists: true} but missing values do not.)

Comment by Michal Bigos [ 24/May/17 ]

Using the index without collation as:

db.names.createIndex(

{ 'name.de': 1, 'name.text': 1 }

, { name: 'name_de', partialFilterExpression: { 'name.de':

{ $exists: true }

} });

yields the same behaviour.

Generated at Thu Feb 08 04:20:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.