[SERVER-59505] Time-series query on mixed, nested measurements can miss some events Created: 23/Aug/21  Updated: 29/Oct/23  Resolved: 02/Nov/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc3

Type: Bug Priority: Blocker - P1
Reporter: David Percy Assignee: Sam Mercier
Resolution: Fixed Votes: 0
Labels: query-director-triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-60672 Simpler pushdown when timeseries coll... Closed
Problem/Incident
Related
related to SERVER-60017 Don't pushdown predicates on dotted p... Closed
related to SERVER-59305 Reject timeseries measurements with a... Closed
related to SERVER-69408 Add randomized testing for the mixed-... Closed
is related to SERVER-60445 $_internalBucketGeoWithin on mixed ty... Closed
is related to SERVER-59163 Enable partial indexes on time-series... Closed
is related to SERVER-59740 Add more end-to-end tests for time-se... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: QO 2021-09-06, QO 2021-10-04, QO 2021-10-18, QO 2021-11-01, QO 2021-11-15
Participants:
Linked BF Score: 120

 Description   

Normally when a measurement field 'x' contains an array, the 'control.min.x' field on the bucket contains the min for each position in the array.

> db.events.find()
{ x: [0, 2] }
{ x: [3, 1] }
 
> db.system.buckets.events.find()
{
  control: {
    min: {x: [0, 1]},
    max: {x: [3, 2]},
  },
  ...
}

But if 'x' also contains non-arrays, then 'control.min.x' or 'control.max.x' will be a non-array:

> db.events.find()
{ x: [0, 2] }
{ x: [3, 1] }
{ x: 10 }
 
> db.system.buckets.events.find()
{
  control: {
    min: {x: 10},
    max: {x: [3, 2]},
  },
  ...
}

The predicate pushdowns don't account for this, so "multikey" queries can incorrectly exclude this bucket:

> db.events.find({ x: {$lt: 5} })
(no results)
 
// because internally:
> db.system.buckets.events.find({$expr: {$lt: ["$control.min.x", 5]}})

A similar thing can happen if 'x' is a mixture of objects and non-objects:

> db.events.find()
{ "time" : ..., "x" : ISODate("1970-01-01T00:00:00Z"), "_id" : ... }
{ "time" : ..., "x" : { "y" : ISODate("2021-01-02T11:59:59Z") }, "_id" : ...) }
 
> db.events.find({ 'x.y': {$gt: ISODate('2000-01-01')} })
(no results)

This happens because although 'control.max.x' is the max of 'x', 'control.max.x.y' is not the max of 'x.y'. ('control.max.x.y' is 'missing', but 'missing' < ISODate.)



 Comments   
Comment by Githook User [ 02/Nov/21 ]

Author:

{'name': 'samontea', 'email': 'merciers.merciers@gmail.com', 'username': 'samontea'}

Message: SERVER-59505 Fix TS pushdown predicate to capture variable type measurements
Branch: master
https://github.com/mongodb/mongo/commit/d47a25210252140172b9f8aa99f78662d7c1fcaf

Comment by Githook User [ 02/Nov/21 ]

Author:

{'name': 'samontea', 'email': 'merciers.merciers@gmail.com', 'username': 'samontea'}

Message: SERVER-59505 Fix TS pushdown predicate to capture variable type measurments
Branch: server59505-3
https://github.com/mongodb/mongo/commit/5a5343ea0fa80834dc7b27f00bda011c0a1d648c

Comment by Githook User [ 02/Nov/21 ]

Author:

{'name': 'samontea', 'email': 'merciers.merciers@gmail.com', 'username': 'samontea'}

Message: SERVER-59505 Fix TS pushdown predicate to capture variable type
measurements
Branch: v5.0
https://github.com/mongodb/mongo/commit/e35f12a0cfe69e9ec85f6d2cad8c1be2f2898e2a

Comment by Githook User [ 29/Oct/21 ]

Author:

{'name': 'samontea', 'email': 'merciers.merciers@gmail.com', 'username': 'samontea'}

Message: SERVER-59505 Fix tenant migration test suite
Branch: v5.1
https://github.com/mongodb/mongo/commit/f67e98440c7785331f3d4007d4a4864ab611613b

Comment by Githook User [ 27/Oct/21 ]

Author:

{'name': 'samontea', 'email': 'merciers.merciers@gmail.com', 'username': 'samontea'}

Message: SERVER-59505 Fix TS pushdown predicate to capture variable type
measurements
Branch: v5.1
https://github.com/mongodb/mongo/commit/c99bd28a555001cfe61cf0164d10b4ec187962f7

Comment by David Percy [ 23/Aug/21 ]

Ideally the bucket format would include more information in the 'control' fields, but given the current format we need to fix the query rewrites.

Instead of converting 'x < 10' to 'control.min.x < 10', we can generate something like 'control.min.x < 10 or (control.min.x < any array value < control.max.x)'. But we also need to consider:

  • How well this more complex predicate can be indexed.
  • How this interacts with partial indexes, which currently don't support $or.
Generated at Thu Feb 08 05:47:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.