[SERVER-40090] DISTINCT_SCAN in agg is only used when certain format of _id is specified Created: 12/Mar/19  Updated: 29/Oct/23  Resolved: 25/Jun/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.7.0, 4.2.12, 4.4.4

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Katherine Wu (Inactive)
Resolution: Fixed Votes: 0
Labels: asya, query-44-grooming, snp, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-53626 Minimize index scanning when retrievi... Backlog
is related to SERVER-9507 Optimize $sort+$group+$first pipeline... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4, v4.2
Sprint: Query 2020-06-29
Participants:
Case:

 Description   

Implementation in SERVER-9507 only works when specific format for _id in $group is used:

// works
db.newTest.explain(1).aggregate([
   {$sort:{valueDate:1}},
   {$group:{_id: "$valueDate", first: {$first: "$farmId"}}}
]).stages[0]["$cursor"]["executionStats"]
{
        "executionSuccess" : true,
        "nReturned" : 2,
        "executionTimeMillis" : 0,
        "totalKeysExamined" : 2,
        "totalDocsExamined" : 2,
... }
// doesn't work
db.newTest.explain(1).aggregate([
  {$sort:{valueDate:1}},
  {$group:{_id: {v:"$valueDate"}, first: {$first: "$farmId"}}}
]).stages[0]["$cursor"]["executionStats"]
{
        "executionSuccess" : true,
        "nReturned" : 5,
        "executionTimeMillis" : 0,
        "totalKeysExamined" : 10,
        "totalDocsExamined" : 5,
... }
 



 Comments   
Comment by Asya Kamsky [ 22/Sep/21 ]

That's it exactly.

 

Comment by Katherine Wu (Inactive) [ 22/Sep/21 ]

Hi asya, does SERVER-53626 encapsulate what you are looking for? Linking as related.

Comment by Asya Kamsky [ 22/Sep/21 ]

Note that this ticket only implements optimization if the _id field is specified as a field path or as a singleton object.

It will not work if there is an index on (x,y,z), group on (x,y) to get $first for z?

If there is no SERVER ticket tracking that, I will create one and link.

Comment by Githook User [ 08/Jan/21 ]

Author:

{'name': 'Katherine Wu', 'email': 'katherine.wu@mongodb.com', 'username': 'kaywux'}

Message: SERVER-40090 DISTINCT_SCAN is only used when certain format of $group _id is specified

(cherry picked from commit 20c0cc8c93f9ce27207067b0776bad08f84b47d0)
Branch: v4.2
https://github.com/mongodb/mongo/commit/87d0144984752767599297de915c45d828316896

Comment by Ian Whalen (Inactive) [ 07/Jan/21 ]

Author:

{'username': u'evrg-bot-webhook', 'name': u'Katherine Wu', 'email': u'katherine.wu@mongodb.com'}

Message:SERVER-40090 DISTINCT_SCAN is only used when certain format of $group _id is specified

(cherry picked from commit 20c0cc8c93f9ce27207067b0776bad08f84b47d0)
Branch:v4.4
https://github.com/mongodb/mongo/commit/eaa799bc1c7ce0dca6dbb1ce05d9988a2ae8d3c7

Comment by Githook User [ 25/Jun/20 ]

Author:

{'name': 'Katherine Wu', 'email': 'katherine.wu@mongodb.com', 'username': 'kaywux'}

Message: SERVER-40090 DISTINCT_SCAN is only used when certain format of $group _id is specified
Branch: master
https://github.com/mongodb/mongo/commit/20c0cc8c93f9ce27207067b0776bad08f84b47d0

Generated at Thu Feb 08 04:54:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.