[SERVER-11334] Failing assertion when accessing indexes (corruption?) Created: 23/Oct/13  Updated: 18/Mar/14  Resolved: 17/Jan/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Storage
Affects Version/s: 2.5.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Bruce Lucas (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File corrupted_db.tgz    
Operating System: ALL
Participants:

 Description   

I've broken a collection often enough and seen this failing assertion come up enough times in different contexts where I have to file this ticket despite not being able to reliably reproduce getting a collection into this bad state. At first I thought it had to do with bringing a `mongod` up and down (cleanly) while running an application against it. Now I've run into this while running (and re-running) tests. I have the bad data files saved on the side. Going to see if re-running the same tests over and over can reproduce the problem (with -vv).

Any ideas on things to look out for are welcome.

2013-10-23T11:36:42.921-0400 [conn1] Assertion: 10334:BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: ObjectId('5267eceae4b0820066d7c32c')
2013-10-23T11:36:42.927-0400 [conn1] mmsdbconfig.config.confirmedContacts 0x10e1411 0x108cf69 0x10731b6 0x10736ac 0x74b1b5 0xa52633 0xa5a984 0xd2000e 0xd21870 0xd22387 0xd2bacf 0xc23826 0xb6fdee 0xb72fd2 0x74edcd 0x10a029b 0x7fc5cde4ce9a 0x7fc5cd15fccd 
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x10e1411]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x108cf69]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo11msgassertedEiPKc+0xe6) [0x10731b6]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod() [0x10736ac]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x405) [0x74b1b5]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo10FetchStage4workEPl+0x343) [0xa52633]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo10LimitStage4workEPl+0x34) [0xa5a984]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo15MultiPlanRunner12workAllPlansEv+0x9e) [0xd2000e]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo15MultiPlanRunner12pickBestPlanEPm+0x30) [0xd21870]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo15MultiPlanRunner7getNextEPNS_7BSONObjEPNS_7DiskLocE+0x1b7) [0xd22387]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo11newRunQueryEPNS_14CanonicalQueryERNS_5CurOpERNS_7MessageE+0x37f) [0xd2bacf]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x19f6) [0xc23826]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod() [0xb6fdee]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x3d2) [0xb72fd2]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9d) [0x74edcd]
 /home/dgottlieb/Downloads/mongodb-linux-x86_64-2.5.3/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4fb) [0x10a029b]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fc5cde4ce9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc5cd15fccd]

I believe I've always observed a warning log of the following form on the collection in question before seeing a collection/index get into this bad state:

2013-10-23T11:36:10.525-0400 [conn65] unindex failed (key too big?) mmsdbconfig.config.confirmedContacts.$addrTypeConfirmedIdx key: { : "nobody@mongodb.com", : 1, : true } _id: ObjectId('5267eceae4b0820066d7c32c')

And some more information on the colletion (after being in a bad state)

mmsdbconfig> db.config.confirmedContacts.find()
mmsdbconfig> db.config.confirmedContacts.find({ address: "nobody@mongodb.com", typeId: 1 })
error: {
	"$err" : "BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: ObjectId('5267eceae4b0820066d7c32c')",
	"code" : 10334
}
mmsdbconfig> db.config.confirmedContacts.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"ns" : "mmsdbconfig.config.confirmedContacts",
		"name" : "_id_"
	},
	{
		"v" : 1,
		"key" : {
			"address" : 1,
			"typeId" : 1
		},
		"unique" : true,
		"ns" : "mmsdbconfig.config.confirmedContacts",
		"name" : "addrTypeIdx"
	},
	{
		"v" : 1,
		"key" : {
			"address" : 1,
			"typeId" : 1,
			"confirmed" : 1
		},
		"unique" : true,
		"ns" : "mmsdbconfig.config.confirmedContacts",
		"name" : "addrTypeConfirmedIdx"
	}
]
mmsdbconfig> db.config.confirmedContacts.stats()
{
	"ns" : "mmsdbconfig.config.confirmedContacts",
	"count" : 0,
	"size" : 0,
	"storageSize" : 8192,
	"numExtents" : 1,
	"nindexes" : 3,
	"lastExtentSize" : 8192,
	"paddingFactor" : 1,
	"systemFlags" : 1,
	"userFlags" : 0,
	"totalIndexSize" : 24528,
	"indexSizes" : {
		"_id_" : 8176,
		"addrTypeIdx" : 8176,
		"addrTypeConfirmedIdx" : 8176
	},
	"ok" : 1
}



 Comments   
Comment by Bruce Lucas (Inactive) [ 27/Oct/13 ]

Based on the current state of the free list, it appears that there has never been more than one document in this collection, making this a particularly simple scenario. Do you know more about the history of the document - is it inserted and then never changed? Is it updated at any point (e.g. is the "confirmed" field changed)?

Comment by Bruce Lucas (Inactive) [ 25/Oct/13 ]

All collections in the db are empty, as are all indexes but one, the one producing the failure. db.config.confirmedContacts.validate(true) shows results consistent with the "unindex failed" message and the subsequent bson object size complaint (where 0xEEEEEEEE is a marker for a deleted document).

	"nrecords" : 0,
	"objectsFound" : 0,
	"deletedCount" : 2,
	"nIndexes" : 3,
	"keysPerIndex" : {
		"mmsdbconfig.config.confirmedContacts.$_id_" : 0,
		"mmsdbconfig.config.confirmedContacts.$addrTypeIdx" : 0,
		"mmsdbconfig.config.confirmedContacts.$addrTypeConfirmedIdx" : 1
	},

In other words, the last document in the collection was successfully deleted, as was its key in two of the indexes, but its key remains in one index, presumably as a result of the "unindex failed", and that produces the "bson object size" complaint when the document is accessed via the now incorrect index.

You might try running validate(true) on the failing collection whenever you see this. Is it always the last record? Is it always the same collection and index? Is there any pattern to the indexes that generate the problem?

Comment by Daniel Gottlieb (Inactive) [ 25/Oct/13 ]

One thing I noticed is that loading these data files on v2.4.7 and running the offending query does not result in a failing assertion. Presumably this means, if the assertion is caused by corruption, the bug may have been introduced indeterminately long ago and 2.5.x has just added more validation and/or bubbling up this specific assertion.

Generated at Thu Feb 08 03:25:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.