[SERVER-14676] Index creation caused Fatal Assertion 16967 Created: 24/Jul/14  Updated: 16/Nov/21  Resolved: 12/Aug/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Storage
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ryan Carmichael Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongoindexbuild.txt     Text File mongorepairdb.txt    
Issue Links:
Related
Operating System: ALL
Participants:

 Description   

On OSX, with mongo version:
db version v2.6.0
2014-07-24T16:08:10.119-0500 git version: nogitversion

Single server, no replica sets. Mongo crashed when I tried to ensureIndex on a collection. I had previously (as in a few minutes before) successfully added other indexes.

Mongo starts back up but db.repairDatabase() crashes it again. It is a relatively small/medium sized collection - about 11 million rows at a taking up about 10GB of disk space.

Snippet below (see full stack traces attached)

014-07-24T15:48:43.790-0500 [conn1] 		Index Build: 8066000/11203300	71%
2014-07-24T15:48:46.000-0500 [conn1] 		Index Build: 9142100/11203300	81%
2014-07-24T15:48:48.844-0500 [conn1] Confirmation.actionLog Fatal Assertion 16967
2014-07-24T15:48:48.848-0500 [conn1] Confirmation.actionLog 0x10e60b3ea 0x10e5c11c4 0x10e5af9cb 0x10e40bdb3 0x10e40bddc 0x10e438b15 0x10e1113ef 0x10e31363b 0x10dffb97a 0x10dffc8e5 0x10dff2f0a 0x10e028d14 0x10e0c0869 0x10e0c164e 0x10e0c2921 0x10e2fd2ac 0x10e1af18f 0x10defb41e 0x10e5cebe6 0x10e63f111
 0   mongod                              0x000000010e60b3ea _ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE + 58
 1   mongod                              0x000000010e5c11c4 _ZN5mongo10logContextEPKc + 196
 2   mongod                              0x000000010e5af9cb _ZN5mongo13fassertFailedEi + 187
 3   mongod                              0x000000010e40bdb3 _ZNK5mongo13ExtentManager21getNextRecordInExtentERKNS_7DiskLocE + 179
 4   mongod                              0x000000010e40bddc _ZNK5mongo13ExtentManager13getNextRecordERKNS_7DiskLocE + 28
 5   mongod                              0x000000010e438b15 _ZN5mongo12FlatIterator7getNextEv + 85
 6   mongod                              0x000000010e1113ef _ZN5mongo14CollectionScan4workEPm + 303
 7   mongod                              0x000000010e31363b _ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_7DiskLocE + 267
 8   mongod                              0x000000010dffb97a _ZN5mongo18addExistingToIndexEPNS_10CollectionEPKNS_15IndexDescriptorEPNS_17IndexAccessMethodEb + 890
 9   mongod                              0x000000010dffc8e5 _ZN5mongo12buildAnIndexEPNS_10CollectionEPNS_17IndexCatalogEntryEb + 1365
 10  mongod                              0x000000010dff2f0a _ZN5mongo12IndexCatalog11createIndexENS_7BSONObjEbNS0_16ShutdownBehaviorE + 1290
 11  mongod                              0x000000010e028d14 _ZN5mongo14CmdCreateIndex3runERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS_7BSONObjEiRS7_RNS_14BSONObjBuilderEb + 5892
 12  mongod                              0x000000010e0c0869 _ZN5mongo12_execCommandEPNS_7CommandERKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderEb + 41
 13  mongod                              0x000000010e0c164e _ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 2286
 14  mongod                              0x000000010e0c2921 _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi + 1313
 15  mongod                              0x000000010e2fd2ac _ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_ + 876
 16  mongod                              0x000000010e1af18f _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE + 2351
 17  mongod                              0x000000010defb41e _ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE + 190
 18  mongod                              0x000000010e5cebe6 _ZN5mongo17PortMessageServer17handleIncomingMsgEPv + 1078
 19  mongod                              0x000000010e63f111 _ZN5boost12_GLOBAL__N_112thread_proxyEPv + 177
2014-07-24T15:48:48.848-0500 [conn1]



 Comments   
Comment by Ramon Fernandez Marina [ 12/Aug/14 ]

Thanks for the update ryan.carmichael@elabsolutions.com, glad to hear 2.6.3 is working for you. We're going to mark this ticket as resolved, since not having database files makes it impossible to do any further investigation.

Regards,
Ramón.

Comment by Ryan Carmichael [ 12/Aug/14 ]

Just as a sort of final comment on my side, I updated to 2.6.3, wiped the database and did bunch more inserts with the same data, ensureIndexes, etc., etc., all with logLevel set to 1. Everything was A-OK, no repeats of the crash, repairDb runs with no issues. Removed all the data, inserted the 11M rows again, several times. All good.

Comment by Ramon Fernandez Marina [ 01/Aug/14 ]

Thanks for the update rcarmich. I'm going to keep this open a tad longer and revisit the stack traces to see if we can find something.

One suggestion for your consideration is to increase the logLevel to 1 if you re-insert the data. If there's a bug in MongoDB and you hit the problem again, the verbose logs may help us find out what's going on.

Thanks,
Ramón.

Comment by Ryan Carmichael [ 01/Aug/14 ]

Unfortunately I have not been able to easily reproduce this again. I suppose the ticket could be closed. If this were a production system I would be able to spend more time figuring it out but since it is a development machine it will be easier to just wipe the data. Thanks for looking into it.

Comment by Ryan Carmichael [ 29/Jul/14 ]

This is the type of data inserted (about 11 million rows of it). The data below is fake, not real patient info.

{ _id: ObjectId('53d150edd6c4a034594b1eb7'), action: "Added medication 'XANAX'", objectType: "com.xyz.abc.Specimen", searchParams: "Jim|Dr. XYZ|111223X|Urine|123-555-1212|St. Louis|MO|St. Louis|J234382|Smith|111_111223X|", username: "myusername", eventId: 16, eventTime: new Date(1402423097000), objectId: 529237 }

The indexes I had were on:
username
objectId

The index I was trying to add when it crashed:
eventId

I was adding them via:

db.actionLog.ensureIndex(

{username: 1}

)
db.actionLog.ensureIndex(

{objectId: 1}

)
db.actionLog.ensureIndex(

{eventId: 1}

)

I am not aware of any unclean shutdowns.. Couldn't find any indication of such in the log files, except for the ones I had when the issue started and it crashed during the index build.

The SMART status of my hard drive is fine. I will try to run a full diagnostic of it tomorrow.

Comment by Ramon Fernandez Marina [ 28/Jul/14 ]

rcarmich, did you have any unclean shutdowns before the problems started? Also, can you tell us a bit more about the number of indexes and their shape? Also, how were you building those indexes? Finally, can you double-check the health of your hard drives? Flaky drives are a common cause for data corruption, and we need to rule that option out no matter how unlikely.

I'm trying to reproduce on my end with sample documents containing numbers, strings, and arrays of numbers. If you manage to reproduce the problem with data you can share please let us know.

Thanks,
Ramón.

Comment by Ryan Carmichael [ 26/Jul/14 ]

Unfortunately the data contains patient information and is not something I can share. It probably doesn't help any but what preceded this issue was me inserting a bunch of data into the table, creating some indexes, and then db.actionLog.remove({}). I loaded and erased the data probably 6 or 7 times, doing db.repairDatabase() several times in between. Basically just messing around with converting some mysql data into mongo.

On Monday I will attempt to recreate the issue with non sensitive data, but I expect it will be difficult to reproduce. In this case the loss or corruption of data isn't important because it was on a development machine... Just have to convince my boss this won't happen in production

Comment by Ramon Fernandez Marina [ 25/Jul/14 ]

rcarmich, looks like your indexes are corrupted, but I would not expect db.repairDatabase() to cause a crash. Is there a chance we can get a copy of your data so we can investigate this problem? If this is an option, we can provide a way to upload the data securely and confidentially – please let us know.

Thanks,
Ramón.

Generated at Thu Feb 08 03:35:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.