[SERVER-82836] UNPACK_TS_BUCKET stage includes fields it doesn't need Created: 06/Nov/23  Updated: 13/Nov/23  Resolved: 13/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Parker Felix Assignee: Charlie Swanson
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-13703 Presence of extraneous $project cause... Backlog
Assigned Teams:
Query Integration
Participants:

 Description   

Was doing some experimentation on the tsbs dataset and I believe I encountered a bug with the UNPACK_TS_BUCKET stage. If you have a $project by a follow $count, the UNPACK_TS_BUCKET stage includes every field in the projection even though these fields aren't needed to produce the final result.

Query 1:

{
    "$match" : {
        "time" : {
            "$lt" : ISODate("2016-01-18T01:49:23Z")
        }
    }
},
{
    "$project" : {
        "_id" : 1,
        "tags" : 1,
        "usage_softirq" : 1,
        "usage_steal" : 1,
        "usage_guest" : 1
    }
},
{
    "$count" : "count"
} 

Query 2:

{
    "$match" : {
        "time" : {
            "$lt" : ISODate("2016-01-18T01:49:23Z")
        }
    }
},
{
    "$count" : "count"
} 

Partial explain diff:

Query1:

"inputStage" : {
    "stage" : "UNPACK_TS_BUCKET",
    "planNodeId" : 2,
    "include" : [
        "_id",
        "time",
        "usage_guest",
        "usage_softirq",
        "usage_steal",
        "tags"
    ],
    "computedMetaProjFields" : [ ],
    "includeMeta" : true, 

Query2:

"inputStage" : {
    "stage" : "UNPACK_TS_BUCKET",
    "planNodeId" : 2,
    "include" : [
        "time"
    ],
    "computedMetaProjFields" : [ ],
    "includeMeta" : false, 

Query1 executes in ~5.5s with featureFlagTsInSbeFull while Query2 executes in ~2s with featureFlagTsInSbeFull



 Comments   
Comment by Charlie Swanson [ 13/Nov/23 ]

I think this is a duplicate of SERVER-13703. You can see in this pipeline that we have a projection right before a $group, and when we track the dependencies of the pipeline, we work left to right (front to back?) to see which fields are needed. When we hit the first stage that has an 'exhaustive' list of fields needed ($project in this example), we stop looking and presume it is accurate.

Generated at Thu Feb 08 06:50:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.