• Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.11
    • Component/s: Index Maintenance
    • Labels:
      None
    • ALL

      See here:
      http://stackoverflow.com/questions/41085666/mongodb-explains-totalkeysexamined-more-than-limit

      I have a very large collection (millions of documents) with data which looks like:

          u'timestamp': 1481454871423.0,
          u'_id': ObjectId('584d351772c4d8106cc43116'),
          u'data': {
                 ...
              },
          u'geocode': [{u'providerId': 2, u'placeId': 97459515},
                       {u'providerId': 3, u'placeId': 237},
                       {u'providerId': 3, u'placeId': 3}]
      

      I want a query which targets a providerId and placeId pair, and returns 10 records only, within a timestamp range.

      To that end I perform queries like:

           'geocode.providerId': 3,
           'geocode.placeId': 3
           'timestamp': { '$gte': 1481454868360L,
                          '$lt': 1481454954839L }
      

      And I provide a hint, to target the index which looks like:

          [('geocode.providerId', 1), ('geocode.placeId', 1), ('timestamp', 1)]
      

      where 1 is ascending. Before iterating over the returned cursor, it is *limited to 10 records* and sorted ascending on timestamp (which should be it's default state due to the index).

      A pymongo query looks like:

          collection.find(findDic).hint(hint).sort([('timestamp', pymongo.ASCENDING)]).skip(0).limit(10)
      

      The query explains come back looking like:

          {
          u'executionStats': {
              u'executionTimeMillis': 1270,
              u'nReturned': 10,
              u'totalKeysExamined': 568686,
              u'allPlansExecution': [],
              u'executionSuccess': True,
              u'executionStages': {
                  u'needYield': 0,
                  u'saveState': 4442,
                  u'memUsage': 54359,
                  u'restoreState': 4442,
                  u'memLimit': 33554432,
                  u'isEOF': 1,
                  u'inputStage': {
                      u'needYield': 0,
                      u'saveState': 4442,
                      u'restoreState': 4442,
                      u'isEOF': 1,
                      u'inputStage': {
                          u'needYield': 0,
                          u'docsExamined': 284964,
                          u'saveState': 4442,
                          u'restoreState': 4442,
                          u'isEOF': 1,
                          u'inputStage': {
                              u'saveState': 4442,
                              u'isEOF': 1,
                              u'seenInvalidated': 0,
                              u'keysExamined': 568686,
                              u'nReturned': 284964,
                              u'invalidates': 0,
                              u'keyPattern': {u'geocode.providerId': 1,
                                      u'timestamp': 1,
                                      u'geocode.placeId': 1},
                              u'isUnique': False,
                              u'needTime': 283722,
                              u'isMultiKey': True,
                              u'executionTimeMillisEstimate': 360,
                              u'dupsTested': 568684,
                              u'restoreState': 4442,
                              u'direction': u'forward',
                              u'indexName': u'geocode.providerId_1_geocode.placeId_1_timestamp_1',
                              u'isSparse': False,
                              u'advanced': 284964,
                              u'stage': u'IXSCAN',
                              u'dupsDropped': 283720,
                              u'needYield': 0,
                              u'isPartial': False,
                              u'indexBounds': {u'geocode.providerId': [u'[3, 3]'
                                      ],
                                      u'timestamp': [u'[-inf.0, 1481455513378)'
                                      ],
                                      u'geocode.placeId': [u'[MinKey, MaxKey]'
                                      ]},
                              u'works': 568687,
                              u'indexVersion': 1,
                              },
                          u'nReturned': 252823,
                          u'needTime': 315863,
                          u'filter': {u'$and': [{u'geocode.placeId': {u'$eq': 3}},
                                      {u'timestamp': {u'$gte': 1481405886510L}}]},
                          u'executionTimeMillisEstimate': 970,
                          u'alreadyHasObj': 0,
                          u'invalidates': 0,
                          u'works': 568687,
                          u'advanced': 252823,
                          u'stage': u'FETCH',
                          },
                      u'nReturned': 0,
                      u'needTime': 315864,
                      u'executionTimeMillisEstimate': 1150,
                      u'invalidates': 0,
                      u'works': 568688,
                      u'advanced': 0,
                      u'stage': u'SORT_KEY_GENERATOR',
                      },
                  u'nReturned': 10,
                  u'needTime': 568688,
                  u'sortPattern': {u'timestamp': 1},
                  u'executionTimeMillisEstimate': 1200,
                  u'limitAmount': 10,
                  u'invalidates': 0,
                  u'works': 568699,
                  u'advanced': 10,
                  u'stage': u'SORT',
                  },
              u'totalDocsExamined': 284964,
              },
          u'queryPlanner': {
              u'parsedQuery': {u'$and': [{u'geocode.placeId': {u'$eq': 3}},
                               {u'geocode.providerId': {u'$eq': 3}},
                               {u'timestamp': {u'$lt': 1481455513378L}},
                               {u'timestamp': {u'$gte': 1481405886510L}}]},
              u'rejectedPlans': [],
              u'namespace': u'mxp957.tweet_244de17a-aa75-4da9-a6d5-97b9281a3b55',
              u'winningPlan': {
                  u'sortPattern': {u'timestamp': 1},
                  u'inputStage': {u'inputStage': {u'filter': {u'$and': [{u'geocode.placeId': {u'$eq': 3}},
                                  {u'timestamp': {u'$gte': 1481405886510L}}]},
                                  u'inputStage': {
                      u'direction': u'forward',
                      u'indexName': u'geocode.providerId_1_geocode.placeId_1_timestamp_1',
                      u'isUnique': False,
                      u'isSparse': False,
                      u'isPartial': False,
                      u'indexBounds': {u'geocode.providerId': [u'[3, 3]'],
                                       u'timestamp': [u'[-inf.0, 1481455513378)'
                                       ],
                                       u'geocode.placeId': [u'[MinKey, MaxKey]'
                                       ]},
                      u'isMultiKey': True,
                      u'stage': u'IXSCAN',
                      u'indexVersion': 1,
                      u'keyPattern': {u'geocode.providerId': 1,
                                      u'timestamp': 1,
                                      u'geocode.placeId': 1},
                      }, u'stage': u'FETCH'},
                          u'stage': u'SORT_KEY_GENERATOR'},
                  u'limitAmount': 10,
                  u'stage': u'SORT',
                  },
              u'indexFilterSet': False,
              u'plannerVersion': 1,
              },
          u'ok': 1.0,
          u'serverInfo': {
              u'host': u'rabbit',
              u'version': u'3.2.11',
              u'port': 27017,
              u'gitVersion': u'009580ad490190ba33d1c6253ebd8d91808923e4',
              },
          }
      

      I don't understand why all of these documents need to be examined. In the case above, the size of the collection is only 284587 which means that every record was looked at twice! I want totalKeysExamined to only be 10, but am struggling to see how to achieve this.

      I am using MongoDB version 3.2.11 and pymongo.

            Assignee:
            Unassigned Unassigned
            Reporter:
            watfordxp Michael Pryor
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: