Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.0.16
Component/s: Querying
Labels:
- qexec-team

Assigned Teams:

Query Execution
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Imagine a collection where the field "a" can store either an integer or an array of integers:

> db.c.find();
{_id: 1, a: 3}
{_id: 2, a: [4, 11]}
{_id: 3, a: [-1, 10]}

Now, suppose that a user wishes to find documents where either "a" is a number in the interval [0, 10], or "a" is an array containing at least one number in the interval [0, 10]. This query should match documents _id:1 and _id:2, but not _id:3. A naive way to write this query would be like so:

> db.c.find({a: {$gte: 0, $lte: 10}});
{_id: 1, a: 3}
{_id: 2, a: [4, 11]}
{_id: 3, a: [-1, 10]}

As you can see, the semantics of this query are to match all three documents, which was not our intention. This is because there is no requirement that an individual array element satisfy both the $gte and the $lte predicate simultaneously. In order to express this simultaneity for array elements, the match language offers $elemMatch:

> db.c.find({a: {$elemMatch: {$gte: 0, $lte: 10}}});
{_id: 2, a: [4, 11]}

Unfortunately, as you can see above, this query is not quite right either. _id:3 no longer matches, which is good, but _id:1 also no longer matches. Recall that the query which we want to express should match _id:1 and _id:2. However, the $elemMatch predicate filters out documents where "a" is not an array. This scenario motivates a feature request. We should have an operator in the match language which is akin to $elemMatch, but also matches scalars:

> db.c.find({a: {$inventedNewKindOfElemMatch: {$gte: 0, $lte: 10}}});
{_id: 1, a: 3}
{_id: 2, a: [4, 11]}

This hypothetical new operator would also allow the query planner to use tight index bounds over multikey indexes, which should avoid any slow plans with loose index bounds such as those discussed in the original issue report below.

Original description

We have a collection of 28 million records that have a IP field that is sometimes a NumberInt (for IPv4), sometimes BinData (for IPv6), and sometimes an array of two elements of either type (we sometimes need to track two addresses).

Historically this was a regular index – IP was just a scalar – and way back in those days we had scripts that would run range queries to find people associated with a CIDR block or an ASN routing block. Before we went multi-key these range queries were super fast however today I found that now a range query is so slow I'm not sure it's actually doing anything:

db.accounts.count({IP:{$gte:NumberInt(-656117760),$lte:NumberInt(-656116737)}})

I waited for over 5 minutes before aborting it.

If I convert this into a dumb loop of 1023 scalar queries it's super fast:

var count = 0; for(var i = -656117760; i <= -656116737; ++i) { count += db.accounts.count(

{IP:NumberInt(i)}

); } print(count);

This took about 2 seconds, if that.

I double checked with query-explain and it looks like it's still doing an IXSCAN so I can't understand why the smart query should take so long (see attached explain.txt). The only difference seems to be that when it's scanning a multi-key index the index bounds are infinite on one side.

I did a quick skim of the 4.2 release notes and didn't see any mention of this being improved - sorry if I missed it.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

diagnostic.data.zip
17 kB
May 05 2020 03:17:04 PM UTC
explain.txt
5 kB
May 04 2020 03:14:17 PM UTC
mongod.zip
2 kB
May 05 2020 03:17:04 PM UTC

is related to

SERVER-6050 Consider allowing $elemMatch applied to non arrays

Closed

Assignee:: [DO NOT USE] Backlog - Query Execution
Reporter:: Glen Miner
Participants:: [DO NOT USE] Backlog - Query Execution, Carl Champain, David Storch, Glen Miner
Votes:: 1 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: May 04 2020 03:25:37 PM UTC
Updated:: Apr 07 2023 03:45:50 PM UTC

Details

Description

Original description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates