Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11219

mongod crash during index build leaves orphaned index

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 2.4.5
    • Fix Version/s: None
    • Component/s: Indexing
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Index was built in the background and presumably worked before?

      Show
      Index was built in the background and presumably worked before?

      Description

      [ed: original title was "Index shown in getIndexes() does not actually exist"]

      I've been trying to track down a mysterious performance issue on one of our shards for a little while, and it turns out the culprit is an index that doesn't actually exist:

      events2:PRIMARY> db.listen.getIndexes();
      [
              {
                      "v" : 1,
                      "key" : {
                              "_id" : 1
                      },
                      "ns" : "site_events.listen",
                      "name" : "_id_"
              },
              {
                      "v" : 1,
                      "key" : {
                              "uid" : 1
                      },
                      "ns" : "site_events.listen",
                      "name" : "uid_1"
              },
              {
                      "v" : 1,
                      "key" : {
                              "list.i" : 1 // list.i, okay
                      },
                      "ns" : "site_events.listen",
                      "name" : "list.i_1",
                      "background" : true
              }
      ]
      events2:PRIMARY>

      events2:PRIMARY> db.listen.stats();
      {
              "ns" : "site_events.listen",
              "count" : 616369,
              "size" : 7789669296,
              "avgObjSize" : 12637.996550767479,
              "storageSize" : 9305386992,
              "numExtents" : 24,
              "nindexes" : 2,
              "lastExtentSize" : 2146426864,
              "paddingFactor" : 1.0000000000920757,
              "systemFlags" : 1,
              "userFlags" : 1,
              "totalIndexSize" : 45098816,
              "indexSizes" : { // no list.i here!
                      "_id_" : 22026144,
                      "uid_1" : 23072672
              },
              "ok" : 1
      }

      Queries that should use this index do a full scan instead.

        Issue Links

          Activity

          Hide
          rassi J Rassi (Inactive) added a comment -

          repairDatabase finish without errors?

          Show
          rassi J Rassi (Inactive) added a comment - repairDatabase finish without errors?
          Hide
          parkan Arkadiy Kukarkin added a comment - - edited

          Finished without errors, however index was not rebuilt:

          Fri Oct 25 15:44:59.397 [conn17] build index site_events.listen { _id: 1 }
          Fri Oct 25 15:45:09.007 [conn17]                Index: (1/3) External Sort Progress: 159900/620987      25%
          Fri Oct 25 15:45:17.158 [DataFileSync] flushing mmaps took 26058ms  for 40 files
          Fri Oct 25 15:45:19.005 [conn17]                Index: (1/3) External Sort Progress: 347400/620987      55%
          Fri Oct 25 15:45:29.015 [conn17]                Index: (1/3) External Sort Progress: 481600/620987      77%
          Fri Oct 25 15:45:39.037 [conn17]                Index: (1/3) External Sort Progress: 542700/620987      87%
          Fri Oct 25 15:45:49.046 [conn17]                Index: (1/3) External Sort Progress: 568500/620987      91%
          Fri Oct 25 15:45:59.028 [conn17]                Index: (1/3) External Sort Progress: 592200/620987      95%
          Fri Oct 25 15:46:06.210 [conn17]         external sort used : 0 files  in 66 secs
          Fri Oct 25 15:46:06.511 [conn17]         done building bottom layer, going to commit
          Fri Oct 25 15:46:06.513 [conn17]         fastBuildIndex dupsToDrop:0
          Fri Oct 25 15:46:06.590 [conn17] build index done.  scanned 620987 total records. 67.193 secs
          Fri Oct 25 15:46:06.647 [conn17] build index site_events.listen { uid: 1.0 }
          Fri Oct 25 15:46:16.004 [conn17]                Index: (1/3) External Sort Progress: 265800/620987      42%
          Fri Oct 25 15:46:26.018 [conn17]                Index: (1/3) External Sort Progress: 453700/620987      73%
          Fri Oct 25 15:46:36.006 [conn17]                Index: (1/3) External Sort Progress: 526600/620987      84%
          Fri Oct 25 15:46:46.041 [conn17]                Index: (1/3) External Sort Progress: 559400/620987      90%
          Fri Oct 25 15:46:56.040 [conn17]                Index: (1/3) External Sort Progress: 584200/620987      94%
          Fri Oct 25 15:47:06.223 [conn17]         external sort used : 0 files  in 59 secs
          Fri Oct 25 15:47:06.937 [conn17]         done building bottom layer, going to commit
          Fri Oct 25 15:47:06.939 [conn17]         fastBuildIndex dupsToDrop:0
          Fri Oct 25 15:47:07.013 [conn17] build index done.  scanned 620987 total records. 60.365 secs
          Fri Oct 25 15:47:07.404 [conn17] removeJournalFiles
          Fri Oct 25 15:47:08.073 [conn17] command site_events.$cmd command: { repairDatabase: 1.0 } ntoreturn:1 keyUpdates:0 locks(micros) W:428434157 reslen:37 428434ms

                  "indexSizes" : {
                          "_id_" : 18142544,
                          "uid_1" : 15648864
                  }

          Show
          parkan Arkadiy Kukarkin added a comment - - edited Finished without errors, however index was not rebuilt: Fri Oct 25 15:44:59.397 [conn17] build index site_events.listen { _id: 1 } Fri Oct 25 15:45:09.007 [conn17] Index: (1/3) External Sort Progress: 159900/620987 25% Fri Oct 25 15:45:17.158 [DataFileSync] flushing mmaps took 26058ms for 40 files Fri Oct 25 15:45:19.005 [conn17] Index: (1/3) External Sort Progress: 347400/620987 55% Fri Oct 25 15:45:29.015 [conn17] Index: (1/3) External Sort Progress: 481600/620987 77% Fri Oct 25 15:45:39.037 [conn17] Index: (1/3) External Sort Progress: 542700/620987 87% Fri Oct 25 15:45:49.046 [conn17] Index: (1/3) External Sort Progress: 568500/620987 91% Fri Oct 25 15:45:59.028 [conn17] Index: (1/3) External Sort Progress: 592200/620987 95% Fri Oct 25 15:46:06.210 [conn17] external sort used : 0 files in 66 secs Fri Oct 25 15:46:06.511 [conn17] done building bottom layer, going to commit Fri Oct 25 15:46:06.513 [conn17] fastBuildIndex dupsToDrop:0 Fri Oct 25 15:46:06.590 [conn17] build index done. scanned 620987 total records. 67.193 secs Fri Oct 25 15:46:06.647 [conn17] build index site_events.listen { uid: 1.0 } Fri Oct 25 15:46:16.004 [conn17] Index: (1/3) External Sort Progress: 265800/620987 42% Fri Oct 25 15:46:26.018 [conn17] Index: (1/3) External Sort Progress: 453700/620987 73% Fri Oct 25 15:46:36.006 [conn17] Index: (1/3) External Sort Progress: 526600/620987 84% Fri Oct 25 15:46:46.041 [conn17] Index: (1/3) External Sort Progress: 559400/620987 90% Fri Oct 25 15:46:56.040 [conn17] Index: (1/3) External Sort Progress: 584200/620987 94% Fri Oct 25 15:47:06.223 [conn17] external sort used : 0 files in 59 secs Fri Oct 25 15:47:06.937 [conn17] done building bottom layer, going to commit Fri Oct 25 15:47:06.939 [conn17] fastBuildIndex dupsToDrop:0 Fri Oct 25 15:47:07.013 [conn17] build index done. scanned 620987 total records. 60.365 secs Fri Oct 25 15:47:07.404 [conn17] removeJournalFiles Fri Oct 25 15:47:08.073 [conn17] command site_events.$cmd command: { repairDatabase: 1.0 } ntoreturn:1 keyUpdates:0 locks(micros) W:428434157 reslen:37 428434ms "indexSizes" : { "_id_" : 18142544, "uid_1" : 15648864 }
          Hide
          rassi J Rassi (Inactive) added a comment -

          Hmm. I can confirm that repairDatabase on 2.2.3 fetches the index list for the rebuild in the same way that db.collection.getIndexes() does (that is, a query on db.system.indexes), so I'm currently convinced that you got rid of the system.indexes entry somehow (the dropIndex attempt?), sometime between posting the ticket and running repairDatabase. Anyway, you have a fresh database on this member now, so I'd just create the index on it as usual if you haven't already.

          Show
          rassi J Rassi (Inactive) added a comment - Hmm. I can confirm that repairDatabase on 2.2.3 fetches the index list for the rebuild in the same way that db.collection.getIndexes() does (that is, a query on db.system.indexes), so I'm currently convinced that you got rid of the system.indexes entry somehow (the dropIndex attempt?), sometime between posting the ticket and running repairDatabase. Anyway, you have a fresh database on this member now, so I'd just create the index on it as usual if you haven't already.
          Hide
          parkan Arkadiy Kukarkin added a comment -

          Cool, wanted to make sure you didn't need any more data from me. I didn't look at getIndexes output after dropIndex, so it's possible it was partially successful. Building normally now.

          Show
          parkan Arkadiy Kukarkin added a comment - Cool, wanted to make sure you didn't need any more data from me. I didn't look at getIndexes output after dropIndex, so it's possible it was partially successful. Building normally now.
          Hide
          rassi J Rassi (Inactive) added a comment -

          Thanks.

          Copying earlier resolve message: "Issue fixed by refactor in SERVER-2771, resolving as dup."

          Show
          rassi J Rassi (Inactive) added a comment - Thanks. Copying earlier resolve message: "Issue fixed by refactor in SERVER-2771 , resolving as dup."

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: