Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 2.6.1
    • Fix Version/s: None
    • Component/s: Querying
    • Labels:
      None
    • Operating System:
      ALL

      Description

      Hi guys,

      We have a weird issue with MongoDB 2.6.0/2.6.1 picking a wrong index.

      So we have a really huge collection (~250MM docs) with the following indexes defined:

      [
      	{
      		"_id" : 1
      	},
      	{
      		"mid" : 1,
      		"r1" : -1
      	},
      	{
      		"mid" : 1,
      		"r2" : -1
      	},
      	{
      		"key" : 1
      	},
      	{
      		"mid" : 1,
      		"key" : 1
      	},
      	{
      		"mid" : 1
      	}
      ]

      Yet, sometimes, under a heavy load, the database starts freaking out - consumes >1000% CPU, performance degrades dramatically and I see the following entries in the logs:

      2014-05-07T15:54:30.587+0000 [conn153098] query <collection>: { mid: ObjectId('4fa0d0914711596b17039e29'), key: "liuruile@163.com" } planSummary: IXSCAN { mid: 1, r1: -1 } ntoskip:0 nscanned:731072 nscannedObjects:731071 keyUpdates:0 numYields:6048 locks(micros) r:181118904 nreturned:1 reslen:214 486438ms

      As you can see the wrong index is being picked. In fact, I see the same behavior for a lot of other collections as well. It looks like the very first index (after _id) is being picked all the time.

      Do you have any idea on what might be causing this?

        Issue Links

          Activity

          Hide
          hari.pulugurta Hari Pulugurta added a comment -

          Bug still happens in version 2.6.4 on our production database.

          tt2:PRIMARY> db.version()
          2.6.4

          Working set fits in ram on our prod servers
          "workingSet" :

          { "note" : "thisIsAnEstimate", "pagesInMemory" : 10039, "computationTimeMicros" : 15146, "overSeconds" : 1722}

          Collection name: tt2.surveys
          Docs:2949936 (~2.9 mil)

          From our logs we are seeing this when there is a heavy load and its taking down our systems and causing blockage because it ends up scanning 1000's of records instead of 10's

          query tt2.surveys query: { $query: { AppPeriodId: BinData(3, F4EB45477D11D8429C22A2FB00A6CF64), Title: "Resume", AccountId: BinData(3, 23AF59F96695EE46B06EA40D0180F0F0),Status: { $ne: -1 } }, $orderby: { CreatedOn: 1 } } 
          planSummary: IXSCAN { Title: 1.0, CreatedOn: 1.0, _id: 1.0 }
          ntoreturn:0 ntoskip:0 nscanned:213830 nscannedObjects:213830 keyUpdates:0 numYields:97 locks(micros) r:144946310 nreturned:1 reslen:2427 
          103581ms

          When we run this query individually it hits the right index
          "BtreeCursor AccountId_1_AppPeriodId_1_Category_1_PrimIndex_1"

          Show
          hari.pulugurta Hari Pulugurta added a comment - Bug still happens in version 2.6.4 on our production database. tt2:PRIMARY> db.version() 2.6.4 Working set fits in ram on our prod servers "workingSet" : { "note" : "thisIsAnEstimate", "pagesInMemory" : 10039, "computationTimeMicros" : 15146, "overSeconds" : 1722} Collection name : tt2.surveys Docs :2949936 (~2.9 mil) From our logs we are seeing this when there is a heavy load and its taking down our systems and causing blockage because it ends up scanning 1000's of records instead of 10's query tt2.surveys query: { $query: { AppPeriodId: BinData(3, F4EB45477D11D8429C22A2FB00A6CF64), Title: "Resume", AccountId: BinData(3, 23AF59F96695EE46B06EA40D0180F0F0),Status: { $ne: -1 } }, $orderby: { CreatedOn: 1 } } planSummary: IXSCAN { Title: 1.0, CreatedOn: 1.0, _id: 1.0 } ntoreturn:0 ntoskip:0 nscanned:213830 nscannedObjects:213830 keyUpdates:0 numYields:97 locks(micros) r:144946310 nreturned:1 reslen:2427 103581ms When we run this query individually it hits the right index "BtreeCursor AccountId_1_AppPeriodId_1_Category_1_PrimIndex_1"
          Hide
          pasette Dan Pasette added a comment -

          Hari Pulugurta your issue looks like SERVER-14071 to me

          Show
          pasette Dan Pasette added a comment - Hari Pulugurta your issue looks like SERVER-14071 to me
          Hide
          pnock Paul Nock added a comment -

          I'm seeing something that looks like this with 2.6.9:

          2015-04-20T16:26:05.147+0000 [conn61829] query *****.followings query: { $query:

          { user_id: ObjectId('*********') }

          , $orderby:

          { _id: 1 }

          } planSummary: IXSCAN

          { _id: 1 }

          ntoskip:0 nscanned:125454 nscannedObjects:125454 keyUpdates:0 numYields:0 locks(micros) r:181778 nreturned:0 reslen:20 181ms

          db.followings.getIndexes()
          [
          {
          "v" : 1,
          "name" : "id",
          "key" :

          { "_id" : 1 }

          ,
          "ns" : "*****.followings"
          },
          {
          "v" : 1,
          "key" :

          { "user_id" : 1 }

          ,
          "name" : "user_id_1",
          "ns" : "***********.followings"
          }
          ]

          The query at the console on the primary always uses the 'user_id_1' index, but intermittently, the query will start running very slowly on slaves in the replica set. The behavior started after upgrading from 2.4 last week, and happens on some other simple collections too. If I drop and regenerate the index the problem resets itself, but I assume that's similar to just restarting mongod. What is normally a 1ms query jumps to 100ms or more.

          Show
          pnock Paul Nock added a comment - I'm seeing something that looks like this with 2.6.9: 2015-04-20T16:26:05.147+0000 [conn61829] query *****.followings query: { $query: { user_id: ObjectId('*********') } , $orderby: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:125454 nscannedObjects:125454 keyUpdates:0 numYields:0 locks(micros) r:181778 nreturned:0 reslen:20 181ms db.followings.getIndexes() [ { "v" : 1, "name" : " id ", "key" : { "_id" : 1 } , "ns" : "*****.followings" }, { "v" : 1, "key" : { "user_id" : 1 } , "name" : "user_id_1", "ns" : "***********.followings" } ] The query at the console on the primary always uses the 'user_id_1' index, but intermittently, the query will start running very slowly on slaves in the replica set. The behavior started after upgrading from 2.4 last week, and happens on some other simple collections too. If I drop and regenerate the index the problem resets itself, but I assume that's similar to just restarting mongod. What is normally a 1ms query jumps to 100ms or more.
          Hide
          support@sesam.hk SHUK LING added a comment -

          We have the 'same' issue with mongo 3.0.5.
          Randomly the query (slave or master) stop to use an index and start to scan the collection. Could you tell me what you need to debug this kind of issue ?

          Show
          support@sesam.hk SHUK LING added a comment - We have the 'same' issue with mongo 3.0.5. Randomly the query (slave or master) stop to use an index and start to scan the collection. Could you tell me what you need to debug this kind of issue ?
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          SHUK LING, can you please open a separate ticket? Query issues often have different root causes even when exhibiting the same symptoms.

          Thanks,
          Ramón.

          Show
          ramon.fernandez Ramon Fernandez added a comment - SHUK LING , can you please open a separate ticket? Query issues often have different root causes even when exhibiting the same symptoms. Thanks, Ramón.

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: