[SERVER-67072] Incorrect optimization of $match with $exists over the metaField in timeseries discards valid results Created: 07/Jun/22  Updated: 29/Oct/23  Resolved: 13/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.3.1, 6.0.0-rc8
Fix Version/s: 6.0.0-rc10, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Milena Ivanova Assignee: David Percy
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0
Steps To Reproduce:

db.createCollection("yyy", { timeseries: {timeField: "time", metaField: "tag"}});
 
db.yyy.insertMany([
{_id: 20, time: new Date("2019-07-29T07:46:38.746Z"), tag: {scientist: 2, assistant: 2, office: 0, }, measurement0: 585.0363180488305},
 
{_id: 449, time: new Date("2019-11-29T12:20:34.821Z"), tag: {scientist: 2, assistant: 1, office: 3, }, measurement0: 708.2660627314729},
 
{_id: 454, time: new Date("2019-03-09T07:29:34.201Z"), tag: {scientist: 2, assistant: 0, office: 3, }, measurement0: 220.7121256827354,measurement1: 762.5075842068677}
]);
 
db.yyy.aggregate([ {$match: {$or: [{"tag.office": {$exists: true}}, {"time": {$eq: new Date("2019-02-08T06:41:54.182Z")}}]}}, {$sort: {_id: 1}}]); 

Sprint: QO 2022-06-13, QO 2022-06-27
Participants:
Linked BF Score: 160

 Description   

When the following $match expression is pushed down before $_internalUnpackBucket, the meta field of the timeseries is not recognized correctly. 

{$match: {$or: [{"tag.office": {$exists: true}}, {"time": {$eq: new Date("2019-02-08T06:41:54.182Z")}}]}}

The $exists predicate is expanded into 

 

'$and': [ { 'control.max.tag.office': { '$exists': true } },
          { 'control.min.tag.office': { '$exists': true } }                    ]

 

which results in discarding valid documents from the result set.

The correct expansion of the predicate over the metaField "tag" is:

{ 'meta.office': { '$exists': true } }

The problem doesn't appear for other predicate, such as $eq, or if the predicate is the only one in $match, i.e the following aggregations produce correct results:

 

{$match: {$or: [{"tag.office": {$eq: 3}}, {"time": {$eq: new Date("2019-02-08T06:41:54.182Z")}}]}}
{$match: {"tag.office": {$exists: true}}}

 

 



 Comments   
Comment by Githook User [ 14/Jun/22 ]

Author:

{'name': 'David Percy', 'email': 'david.percy@mongodb.com', 'username': 'dpercy'}

Message: SERVER-67072 Fix pushdown of time-series metadata predicates in $or

(cherry picked from commit b8f0fb561f5f6401dfc9a773777963f2f4bcb725)
Branch: v6.0
https://github.com/mongodb/mongo/commit/b56ef4dbd8adc83a72e5e5c0f61b377ee954cea1

Comment by Githook User [ 13/Jun/22 ]

Author:

{'name': 'David Percy', 'email': 'david.percy@mongodb.com', 'username': 'dpercy'}

Message: SERVER-67072 Fix pushdown of time-series metadata predicates in $or
Branch: master
https://github.com/mongodb/mongo/commit/b8f0fb561f5f6401dfc9a773777963f2f4bcb725

Comment by Milena Ivanova [ 08/Jun/22 ]

steve.la@mongodb.com, james.wahlin@mongodb.com The optimization of $exists that causes this issue was introduced by

commit 610898a1ccec0afc3d38f7c29f2553d5b6102d30

SERVER-59163 Allow creating partial indexes on time-series collection

 

As far as i can see the code is present in v5.3, but not in previous 5.x versions. 

I am working on a solution, but in case of a tight time pressure for release blocker it might be better if someone from the timeseries team takes over, since I am working on reduced hours.

Generated at Thu Feb 08 06:07:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.