cursors that return over 60 million objects are extremely slow

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Cannot Reproduce
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.2.3
    • Component/s: Performance, Querying
    • None
    • Environment:
      Ubuntu 12.04
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      It looks like getmores on cursors that return a large number of objects run significantly slower than cursors that return fewer objects. We noticed this trying to run mongodump on one of our collections which has 84M objects. collection.stats() returns:
      {
      "ns" : "data.app_00da3da8-3ec2-490b-ac3a-1ac5d12d0814:SessionEvent",
      "count" : 84423082,
      "size" : 57713849288,
      "avgObjSize" : 683.6264197035592,
      "storageSize" : 59682012800,
      "numExtents" : 49,
      "nindexes" : 4,
      "lastExtentSize" : 2146426864,
      "paddingFactor" : 1,
      "systemFlags" : 1,
      "userFlags" : 0,
      "totalIndexSize" : 8947258432,
      "indexSizes" :

      { "_id_" : 3478397440, "_acl_1" : 1572457376, "_acl.*.r_1" : 1572457376, "_created_at_1" : 2323946240 }

      ,
      "ok" : 1
      }

      If we try to mongodump this collection it takes about 7 hours. If we instead dump the collection by parts (i.e. split the _id space into 4 parts) and dump them individually, the total run time is about 1.5 hours. We have another collection whose on disk size is greater, but with fewer objects which dumps in about 2 hours. Here is collection.stat() on that collection:
      {
      "ns" : "data.app_d237a400-f548-42cb-85e3-1643daa0dd4e:SaveGame",
      "count" : 1636453,
      "size" : 114000989904,
      "avgObjSize" : 69663.46720865188,
      "storageSize" : 114517589216,
      "numExtents" : 72,
      "nindexes" : 7,
      "lastExtentSize" : 2146426864,
      "paddingFactor" : 1,
      "systemFlags" : 1,
      "userFlags" : 0,
      "totalIndexSize" : 398171200,
      "indexSizes" :

      { "_id_" : 63372176, "UserId_1" : 75538064, "_acl_1" : 28305312, "_acl.*.r_1" : 28305312, "_created_at_1" : 41648544, "UDID_1" : 130153744, "location_1" : 30848048 }

      ,
      "ok" : 1
      }

      Experimentally, the point at which performance falls off a cliff is about 60M objects in the result set.

            Assignee:
            Rui Zhang (Inactive)
            Reporter:
            charity majors
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: