Severe Performance Degradation in Mongo 2.6.2

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Incomplete
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.6.1, 2.6.2
    • Component/s: Performance, Querying
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      I have a sharded cluster with 12 nodes. This is a test instance so no replica sets. I am querying a collection that is sharded across the 12 servers using a shard key of { "_id" : "hashed" }. I also have an index on _id as well. My collection has 124 million documents.

      I am submitting the following script via mongo shell:

         db = db.getSiblingDB('dbName')
         var myCursor=db.cName.find({},{_id: 1}).hint({_id: 1}).batchSize(5000)
         var currentTotal = 0;
         var currentCount = 0;
         while ( myCursor.hasNext() )
         {
            myCursor.next();
            ++currentTotal;
            ++currentCount;
            if (currentCount == 1000000)
            {
               currentCount = 0;
               var currentTime = new Date();
               print("Iterated through " + currentTotal + " documents in " + (currentTime.getTime() - commandStart.getTime()) + " ms");
            }
         }
      

      I ran this using version 2.4.8 and it was iterating at an average of around 200 million per hour. This was using a system that was just started with no warming up. I then upgraded to version 2.6.2 and ran the same script. This time the average was less than 3 million per hour.

      Some additional information.

      First - using 2.6.2 when I run myCursor.explain() to get the plan each one of the 12 shards returns:

          "indexOnly" : false
      

      For version 2.4.8 this is true.

      When I connect directly to one of the shards running 2.6.2 and run:

         var myCursor=db.cName.find({},{_id: 1}).hint({_id: 1}).batchSize(5000)
         myCursor.explain()
      

      The result is the same except "indexOnly" is true.

      Additionally when O monitor the operations that are running on the mongos where I submitted the script I see the following:

              "opid" : "shard0005:31",
              "active" : true,
              "secs_running" : 14,
              "microsecs_running" : NumberLong(14543587),
              "op" : "getmore",
              "ns" : "dbName.cName",
              "query" : {
      
              },
              "client_s" : "1.1.1.1:52589",
              "desc" : "conn3",
              "threadId" : "0x7e77b3d1e700",
              "connectionId" : 3,
              "waitingForLock" : false,
              "numYields" : 4260,
              "lockStats" : {
                      "timeLockedMicros" : {
                              "r" : NumberLong(384559),
                              "w" : NumberLong(0)
                      },
                      "timeAcquiringMicros" : {
                              "r" : NumberLong(12600),
                              "w" : NumberLong(0)
                      }
              }
      

      I only ever see one of these operations running at a time. They run for each shard sequentially: shard0000, shard0001, shard0002... shard0011

      I would expect that the mongos would submit all of these queries in parallel.

            Assignee:
            Unassigned
            Reporter:
            Matt Kolbert
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: