[SERVER-40089] Distinct scan in aggregation doesn't fetch document when group uses $$ROOT Created: 12/Mar/19  Updated: 29/Oct/23  Resolved: 28/Mar/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.1.8
Fix Version/s: 4.1.10

Type: Bug Priority: Critical - P2
Reporter: Asya Kamsky Assignee: Justin Seyster
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

 

Sprint: Query 2019-03-25, Query 2019-04-08
Participants:

 Description   

Looks like DISTINCT_SCAN added in SERVER-9507 doesn't request the full document like it does correctly individual fields.

// correct
db.newTest.aggregate([
   {$sort:{valueDate:1}},
   {$group:{_id: "$valueDate", first: {$first: "$farmId"}}}
])
{ "_id" : ISODate("2019-01-01T00:00:00Z"), "first" : 3 }
{ "_id" : ISODate("2019-02-01T00:00:00Z"), "first" : 1 }
//incorrect
db.newTest.aggregate([
   {$sort:{valueDate:1}},
   {$group:{_id: "$valueDate", first: {$first: "$$ROOT"}}}
])
{ "_id" : ISODate("2019-01-01T00:00:00Z"), "first" : { "valueDate" : ISODate("2019-01-01T00:00:00Z") } }
{ "_id" : ISODate("2019-02-01T00:00:00Z"), "first" : { "valueDate" : ISODate("2019-02-01T00:00:00Z") } }
 



 Comments   
Comment by Githook User [ 28/Mar/19 ]

Author:

{'email': 'justin.seyster@mongodb.com', 'name': 'Justin Seyster', 'username': 'jseyster'}

Message: SERVER-40089 $group optimized with DISTINCT_SCAN cannot use $$ROOT

The getExecutorDistinct() function is responsible for both creating an
executor for the distinct command and creating an executor for a
$group that has been optimized with a DISTINCT_SCAN (see commit
da63195). These two scenarios have different requirements for their
projection, and getExecutorDistinct() distinguished the two by
assuming any caller with an empty ({}) projection wanted the distinct
command projection.

However, a $first accumulator with $$ROOT requires the entire
document, so the logic that builds an optimized $group executor
generates an empty projection for this case as well. When that
happens, getExecutorDistinct() mistakenly chooses the projection that
the distinct command wants, and when the pipeline evaluates $$ROOT, it
only gets to see a small subset of fields in the document.

This patch modifies getExecutorDistinct() so that the caller must
explicitly state what projection it wants. That means that the
distinct command no longer passes an empty projection to indicate that
it wants to project on just the distinct field. Instead, the distinct
command computes the projection for the distinct field on its own and
includes that projection in the ParsedDistinct object that it passes
to getExecutorDistinct().
Branch: master
https://github.com/mongodb/mongo/commit/e73da48e26048cb5ca2120acadac2d9c2c8ee403

Generated at Thu Feb 08 04:53:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.