[SERVER-12990] Abnormal termination of concurrent index builds can lead to a corrupt index catalog Created: 03/Mar/14  Updated: 11/Jul/16  Resolved: 15/Mar/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.4.9, 2.4.10
Fix Version/s: 2.4.10

Type: Bug Priority: Major - P3
Reporter: David Hows Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 1
Labels: corrupt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod-2.4.10RC.log     Text File mongod.log     File server12990_killbgindex.js    
Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

Build a background index
Build a foreground index while the background index is building
When the foreground index finishes perform a killOp of the background

Participants:

 Description   
Issue Status as of March 31, 2014

ISSUE SUMMARY

Building indexes concurrently can lead to a corrupt index catalog. In particular, the order of operations that expose this bug is:

  1. Start a background index build
  2. Start another index build (background or foreground)
  3. After the second index build completes, kill the background index build with db.killOp()

After this series of steps, the index catalog is corrupted and changes to the data in this collection or a call to stats() results in an error.

USER IMPACT

A node ending up with a corrupt index catalog needs to be repaired or resynced from a healthy node.

SOLUTION

The index position of the background index needs to be re-calculated on failure as it may have changed. This allows the server to clean up the failed index build correctly.

WORKAROUNDS

It is advisable to build indexes one at a time, not concurrently.

AFFECTED VERSIONS

All recent production release versions up to 2.4.9 are affected. The 2.6 series is unaffected.

PATCHES

The fix is included in the 2.4.10 production release.

Original Description

If you cancel a background index which is in progress after having already (successfully) created a foreground index you will corrupt the index.

Commands issued to re-create:

shell1> db.test.ensureIndex({x:1,fruits:1,transport:1},{background:true});
shell2> db.test.ensureIndex({x:1,vegetables:1,transport:1});
shell3> db.currentOp()
shell3> db.killOp(173)

Result of collstats after killOp()

> db.test.stats()
{
	"ns" : "test.test",
	"count" : 1358323,
	"size" : 262296864,
	"avgObjSize" : 193.10345477474797,
	"storageSize" : 335896576,
	"numExtents" : 14,
	"nindexes" : 2,
	"lastExtentSize" : 92581888,
	"paddingFactor" : 1,
	"systemFlags" : 0,
	"userFlags" : 0,
	"errmsg" : "exception: BSONObj size: 0 (0x00000000) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO",
	"code" : 10334,
	"ok" : 0
}

This does not effect 2.6RC0



 Comments   
Comment by Amalia Hawkins [ 11/Mar/14 ]

Attached a js testing script which hits this issue some of the time (40%ish of the time.) Will try to refine further.

Generated at Thu Feb 08 03:30:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.