[SERVER-5090] aggregation: use covered index Created: 25/Feb/12  Updated: 27/Oct/15  Resolved: 12/Jul/12

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Eliot Horowitz (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-4644 aggregation: optimize memory utilitz... Closed
depends on SERVER-5089 aggregation needs to yield Closed
depends on SERVER-6023 add an explicit requireOrder argument... Closed
Duplicate
is duplicated by SERVER-4930 aggregation: use dependency informat... Closed
is duplicated by SERVER-5224 support index-only queries under aggr... Closed
Related
related to SERVER-4504 aggregation: need an explain facility Closed
related to DOCS-1008 Add docs for aggregation framework us... Closed
Participants:

 Comments   
Comment by Mathias Stearn [ 12/Jul/12 ]

Optimization hints needed in docs

Comment by auto [ 12/Jul/12 ]

Author:

{u'date': u'2012-07-03T11:17:21-07:00', u'email': u'mathias@10gen.com', u'name': u'Mathias Stearn'}

Message: SERVER-5090 Use covering index for aggregation
Branch: master
https://github.com/mongodb/mongo/commit/cfecc76c19634ad47670e187d46de75472512f30

Comment by Chris Westin [ 27/Mar/12 ]

Are you suggesting that these just point into the disk image until they
can no longer do that safely, and only then make copies?

Suppose I have

db.runCommand({aggregate:"c", pipeline:[
    { $sort: { k : 1 } },
    { $project : {
        x : true,
        y : true,
        z : true
    }}
]);

And, there's no index on "c.k".

The sort is "long," and we will yield; we have to copy fields before the sort.
We also don't know what fields are going to be referenced yet (at the time we
enter the sort), so we'd end up copying everything and pumping it through the
sort unnecessarily. Note there could be more operators before the $sort,
so we can't assume this simple pattern. There could be exclusionary projects,
followed by a $match, or other stuff that takes us further away from the source
collection, but still doesn't tell us what fields we will need at the end.
Just looking at the first operator doesn't tell us what we're going to need:

db.runCommand({aggregate:"c", pipeline:[
    { $project : { _id : false } },
    { $match : { q : 15 } },
    { $sort: { k : 1 } },
    { $project : {
        x : true,
        y : true,
        z : true
    }}
]);

And because of that, combined with the need to yield, we still end up needing
to copy everything.

Getting all the fields is clearly lousy. That's why I keep pointing at
SERVER-4644. To do that, we'll look at the pipeline starting from the end,
and collect all the names of the collection fields we need, and then only
copy those at the beginning, leaving out any we don't need. For this example,
backtracking from the end tells us that we need x, y, z, k, and q. The work
is a little tricky because of the need to also capture fields in expressions
in projections, as well as field renames that can happen in projections.
That's what that ticket is about.

And, once we know what fields we need, we'll know at the beginning if we can
do an index-only query, which is this ticket.

Comment by Chris Westin [ 25/Feb/12 ]

Can't be done unless I complete SERVER-4644 in order to know what fields all stages of the pipeline require. At present there is no knowledge of that.

Generated at Thu Feb 08 03:07:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.