[SERVER-27386] Poor use of index Created: 12/Dec/16  Updated: 12/Dec/16  Resolved: 12/Dec/16

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 3.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michael Pryor Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-15086 Allow for efficient range queries ove... Closed
Operating System: ALL
Participants:

 Description   

See here:
http://stackoverflow.com/questions/41085666/mongodb-explains-totalkeysexamined-more-than-limit

I have a very large collection (millions of documents) with data which looks like:

    u'timestamp': 1481454871423.0,
    u'_id': ObjectId('584d351772c4d8106cc43116'),
    u'data': {
           ...
        },
    u'geocode': [{u'providerId': 2, u'placeId': 97459515},
                 {u'providerId': 3, u'placeId': 237},
                 {u'providerId': 3, u'placeId': 3}]

I want a query which targets a providerId and placeId pair, and returns 10 records only, within a timestamp range.

To that end I perform queries like:

     'geocode.providerId': 3,
     'geocode.placeId': 3
     'timestamp': { '$gte': 1481454868360L,
                    '$lt': 1481454954839L }

And I provide a hint, to target the index which looks like:

    [('geocode.providerId', 1), ('geocode.placeId', 1), ('timestamp', 1)]

where 1 is ascending. Before iterating over the returned cursor, it is *limited to 10 records* and sorted ascending on timestamp (which should be it's default state due to the index).

A pymongo query looks like:

    collection.find(findDic).hint(hint).sort([('timestamp', pymongo.ASCENDING)]).skip(0).limit(10)

The query explains come back looking like:

    {
    u'executionStats': {
        u'executionTimeMillis': 1270,
        u'nReturned': 10,
        u'totalKeysExamined': 568686,
        u'allPlansExecution': [],
        u'executionSuccess': True,
        u'executionStages': {
            u'needYield': 0,
            u'saveState': 4442,
            u'memUsage': 54359,
            u'restoreState': 4442,
            u'memLimit': 33554432,
            u'isEOF': 1,
            u'inputStage': {
                u'needYield': 0,
                u'saveState': 4442,
                u'restoreState': 4442,
                u'isEOF': 1,
                u'inputStage': {
                    u'needYield': 0,
                    u'docsExamined': 284964,
                    u'saveState': 4442,
                    u'restoreState': 4442,
                    u'isEOF': 1,
                    u'inputStage': {
                        u'saveState': 4442,
                        u'isEOF': 1,
                        u'seenInvalidated': 0,
                        u'keysExamined': 568686,
                        u'nReturned': 284964,
                        u'invalidates': 0,
                        u'keyPattern': {u'geocode.providerId': 1,
                                u'timestamp': 1,
                                u'geocode.placeId': 1},
                        u'isUnique': False,
                        u'needTime': 283722,
                        u'isMultiKey': True,
                        u'executionTimeMillisEstimate': 360,
                        u'dupsTested': 568684,
                        u'restoreState': 4442,
                        u'direction': u'forward',
                        u'indexName': u'geocode.providerId_1_geocode.placeId_1_timestamp_1',
                        u'isSparse': False,
                        u'advanced': 284964,
                        u'stage': u'IXSCAN',
                        u'dupsDropped': 283720,
                        u'needYield': 0,
                        u'isPartial': False,
                        u'indexBounds': {u'geocode.providerId': [u'[3, 3]'
                                ],
                                u'timestamp': [u'[-inf.0, 1481455513378)'
                                ],
                                u'geocode.placeId': [u'[MinKey, MaxKey]'
                                ]},
                        u'works': 568687,
                        u'indexVersion': 1,
                        },
                    u'nReturned': 252823,
                    u'needTime': 315863,
                    u'filter': {u'$and': [{u'geocode.placeId': {u'$eq': 3}},
                                {u'timestamp': {u'$gte': 1481405886510L}}]},
                    u'executionTimeMillisEstimate': 970,
                    u'alreadyHasObj': 0,
                    u'invalidates': 0,
                    u'works': 568687,
                    u'advanced': 252823,
                    u'stage': u'FETCH',
                    },
                u'nReturned': 0,
                u'needTime': 315864,
                u'executionTimeMillisEstimate': 1150,
                u'invalidates': 0,
                u'works': 568688,
                u'advanced': 0,
                u'stage': u'SORT_KEY_GENERATOR',
                },
            u'nReturned': 10,
            u'needTime': 568688,
            u'sortPattern': {u'timestamp': 1},
            u'executionTimeMillisEstimate': 1200,
            u'limitAmount': 10,
            u'invalidates': 0,
            u'works': 568699,
            u'advanced': 10,
            u'stage': u'SORT',
            },
        u'totalDocsExamined': 284964,
        },
    u'queryPlanner': {
        u'parsedQuery': {u'$and': [{u'geocode.placeId': {u'$eq': 3}},
                         {u'geocode.providerId': {u'$eq': 3}},
                         {u'timestamp': {u'$lt': 1481455513378L}},
                         {u'timestamp': {u'$gte': 1481405886510L}}]},
        u'rejectedPlans': [],
        u'namespace': u'mxp957.tweet_244de17a-aa75-4da9-a6d5-97b9281a3b55',
        u'winningPlan': {
            u'sortPattern': {u'timestamp': 1},
            u'inputStage': {u'inputStage': {u'filter': {u'$and': [{u'geocode.placeId': {u'$eq': 3}},
                            {u'timestamp': {u'$gte': 1481405886510L}}]},
                            u'inputStage': {
                u'direction': u'forward',
                u'indexName': u'geocode.providerId_1_geocode.placeId_1_timestamp_1',
                u'isUnique': False,
                u'isSparse': False,
                u'isPartial': False,
                u'indexBounds': {u'geocode.providerId': [u'[3, 3]'],
                                 u'timestamp': [u'[-inf.0, 1481455513378)'
                                 ],
                                 u'geocode.placeId': [u'[MinKey, MaxKey]'
                                 ]},
                u'isMultiKey': True,
                u'stage': u'IXSCAN',
                u'indexVersion': 1,
                u'keyPattern': {u'geocode.providerId': 1,
                                u'timestamp': 1,
                                u'geocode.placeId': 1},
                }, u'stage': u'FETCH'},
                    u'stage': u'SORT_KEY_GENERATOR'},
            u'limitAmount': 10,
            u'stage': u'SORT',
            },
        u'indexFilterSet': False,
        u'plannerVersion': 1,
        },
    u'ok': 1.0,
    u'serverInfo': {
        u'host': u'rabbit',
        u'version': u'3.2.11',
        u'port': 27017,
        u'gitVersion': u'009580ad490190ba33d1c6253ebd8d91808923e4',
        },
    }

I don't understand why all of these documents need to be examined. In the case above, the size of the collection is only 284587 which means that every record was looked at twice! I want totalKeysExamined to only be 10, but am struggling to see how to achieve this.

I am using MongoDB version 3.2.11 and pymongo.



 Comments   
Comment by Kelsey Schubert [ 12/Dec/16 ]

Hi watfordxp,

This is expected behavior in MongoDB 3.2. My recommendation would be to upgrade to MongoDB 3.4 to take advantage of SERVER-15086, which allows efficient range queries over non-array fields in multikey indexes. For additional information, please review the linked ticket's summary.

Thanks,
Thomas

Comment by Michael Pryor [ 12/Dec/16 ]

I can't edit my own JIRA - would have liked to correct the formatting. Sorry!

Generated at Thu Feb 08 04:15:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.