[SERVER-12957] Stopping a secondary during an index build can leave a non-usable index which is never reported Created: 28/Feb/14  Updated: 10/Dec/14  Resolved: 03/Mar/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication
Affects Version/s: 2.6.0-rc1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Hows Assignee: Eric Milkie
Resolution: Done Votes: 6
Labels: corrupt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-12956 Stopping a secondary in phase 2/3 of ... Closed
Operating System: ALL
Steps To Reproduce:

Create a replica set
Have enough data in the set to make the index build take 10s of seconds
Build an index on the primary
Kill -5 the secondary when it is over 1/2 complete (follow the logs and look for something like the line below)

2014-02-28T13:26:41.002+1100 [repl index builder 0]             Index Build(background): 775600/1358322 57%

Start the secondary (it will not rebuild the index)
The index is shown to exist in system.indexes but is not usable for queries nor does it show in validate()

Participants:

 Description   

Similar to SERVER-12956, but on 2.6.0RC1 git rev 52fe0d21959e32a5bdbecdc62057db386e4e029c

When a secondary is stopped during the build of an index it will not resume and complete the index build and the index will not be available for use.
The last line of the index build was:

2014-02-28T13:26:41.002+1100 [repl index builder 0]             Index Build(background): 775600/1358322 57%

When running a query on that secondary i see the following:

db.test.find({x:-2}).explain(true);
{
	"cursor" : "BasicCursor",
	"isMultiKey" : false,
	"n" : 1,
	"nscannedObjects" : 1358323,
	"nscanned" : 1358323,
	"nscannedObjectsAllPlans" : 1358323,
	"nscannedAllPlans" : 1358323,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 570,
	"indexBounds" : {
 
	},
	"allPlans" : [
		{
			"cursor" : "BasicCursor",
			"n" : 1,
			"nscannedObjects" : 1358323,
			"nscanned" : 1358323,
			"indexBounds" : {
 
			}
		}
	],
	"server" : "Pixl.local:27018"
}

Validate does not test the index:

....
	"objectsFound" : 1358323,
	"invalidObjects" : 0,
	"bytesWithHeaders" : 347730432,
	"bytesWithoutHeaders" : 325997264,
	"deletedCount" : 15,
	"deletedSize" : 113161136,
	"nIndexes" : 1,
	"keysPerIndex" : {
		"test.test.$_id_" : 1358323
	},
	"valid" : true,
	"errors" : [ ],
	"ok" : 1
}

But system.indexes shows it to exist:

db.system.indexes.find();
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.test" }
{ "v" : 1, "key" : { "x" : 1, "fruits" : 1, "transport" : 1 }, "name" : "x_1_fruits_1_transport_1", "ns" : "test.test", "background" : true }

This looks to be a silent corruption and would only become evident when you examine why said index is not being used for query evaluation,



 Comments   
Comment by David Hows [ 03/Mar/14 ]

I re-read my logs and found I had accidentally started it once more with 2.4.9 rather than 2.6.0

I've tried several subsequent reproductions and was unable to make it crash in the manner initially described.

Comment by Eric Milkie [ 28/Feb/14 ]

Note that the behavior between foreground and background indexes is quite a bit different, so I'm going to assume that only background index builds have this issue. I attempted a few times with foreground index builds and couldn't reproduce the problem.

Comment by Eric Milkie [ 28/Feb/14 ]

Do you have a server log when the index build was running and then when the server was restarted?

Generated at Thu Feb 08 03:30:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.