[SERVER-60010] Count queries matching all of null, missing, and [] should be fully covered by a multikey index Created: 16/Sep/21  Updated: 29/Oct/23  Resolved: 12/Nov/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0

Type: Improvement Priority: Major - P3
Reporter: Alya Berciu Assignee: Hana Pearlman
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Sprint: QO 2021-11-01, QO 2021-11-15
Participants:

 Description   

Based on discussion in SERVER-18861, it may be possible to avoid adding a fetch + filter stage for a count when the query predicate on an indexed field matches all of these values: null, missing, undefined, and [], since a multikey index can fully cover that case.

A multikey index cannot differentiate between these values, so we would still need a FETCH stage if we are returning the value of the indexed field (but not if we're just counting how many documents actually match this predicate).

In the case of a compound index, the query could still be covered as long as we are not returning the value indexed by the multikey index. Some examples of queries we may be able to optimize:

1. If we have a multikey index on {a: 1}, the following queries can be fully covered by the index:

db.coll.count({a: {$in: [null, []]}})
db.coll.aggregate([
   {$match: {$expr: {$or: [{$eq: ["$a", null]}, {$eq: ["$a", []]}]}}},
   {$count: "c"}
])

 

2. If we have a compound index {a: 1, b: 1} such that the index is multikey along the “a” path, we should be able to cover the above queries and the following:

db.coll.count({a: {$in: [null, []]}, b: 1})
db.coll.aggregate([
   {$match: {$expr: {$or: [{$eq: ["$a", null]}, {$eq: ["$a",[]]}]}}},
   {$project: {b: 1}},
])

 



 Comments   
Comment by Hana Pearlman [ 18/Nov/22 ]

601290552@qq.com This is expected behavior. You can see the same thing for $eq: 

> db.foo.find()
{ "_id" : 0, "a" : null }
{ "_id" : 1 }
> db.foo.aggregate({$match: {a: {$eq: null}}})
{ "_id" : 0, "a" : null }
{ "_id" : 1 }
> db.foo.aggregate({$match: {$expr: {$eq: ["$a", null]}}})
{ "_id" : 0, "a" : null }

Notice that the first aggregate matches the document where "a" is missing, but the second aggregate does not. This is because the "$eq" operator under "$expr" has different behavior. It will not match documents where "a" is missing.

Comment by jing xu [ 18/Nov/22 ]

hello:
this two sql is not the same. aggregate missing one record that not exists filed a.
db.coll.count({a: {$in: [null, []]}})

db.coll.aggregate([
{$match: {$expr: {$or: [{$eq: ["$a", null]}, {$eq: ["$a", []]}]}}},
{$count: "c"}
])

POCDB> db.coll.count({a: {$in: [null, []]}})
3
POCDB> db.coll.aggregate([
... {$match: {$expr: {$or: [{$eq: ["$a", null]}, {$eq: ["$a", []]}]}}},
... {$count: "c"}
... ])
[

{ c: 2 }

]
POCDB> db.coll.find({a: {$in: [null, []]}})
[

{ _id: ObjectId("6377178fc252fad22331c7fe"), a: [] }

,

{ _id: ObjectId("6377178fc252fad22331c7ff"), a: null }

,

{ _id: ObjectId("6377178fc252fad22331c800") }

]
POCDB> db.coll.aggregate([{$match: {$expr: {$or: [{$eq: ["$a", null]}, {$eq: ["$a", []]}]}}}])
[

{ _id: ObjectId("6377178fc252fad22331c7fe"), a: [] }

,

{ _id: ObjectId("6377178fc252fad22331c7ff"), a: null }

]
POCDB>

Comment by Githook User [ 12/Nov/21 ]

Author:

{'name': 'Hana Pearlman', 'email': 'hana.pearlman@mongodb.com', 'username': 'HanaPearlman'}

Message: SERVER-60010: Cover null and empty array queries with multikey index where possible
Branch: master
https://github.com/mongodb/mongo/commit/0f0831e128283ccb09770613c8ef0cf430599f91

Generated at Thu Feb 08 05:48:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.