[SERVER-18641] mongodb process crashes on insert with WiredTiger Panic error Created: 24/May/15  Updated: 04/Jun/15  Resolved: 29/May/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Homam Hosseini Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

This keeps happening when importing data with `mongorestore`

Participants:

 Description   

# mongodb.conf
dbpath=/one/data
logpath=/four/log/mongodb/mongodb.log
logappend=true
storageEngine=wiredTiger
journal=true

Mongodb started with `mongod -f /etc/mongodb.conf --wiredTigerEngineConfigString="hazard_max=10000" --fork`

$ mongod --version
db version v3.0.3
git version: b40106b36eecd1b4407eb1ad1af6bc60593c6105
OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014

Log after crash:

Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } 686ms
2015-05-24T12:46:05.292+0400 I WRITE    [conn124] insert MobiOne-events.events ninserted:10000 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } 765ms
2015-05-24T12:46:05.409+0400 E STORAGE  [conn123] WiredTiger (0) [1432457165:409641][1173:0x7f589bda0700], file:index-57--5354510338709796319.wt, cursor.insert: read checksum error [16384B @ 6772817920, 241841496 != 3045535132]
2015-05-24T12:46:05.410+0400 E STORAGE  [conn123] WiredTiger (0) [1432457165:409942][1173:0x7f589bda0700], file:index-57--5354510338709796319.wt, cursor.insert: index-57--5354510338709796319.wt: encountered an illegal file format or internal value
2015-05-24T12:46:05.410+0400 E STORAGE  [conn123] WiredTiger (-31804) [1432457165:410254][1173:0x7f589bda0700], file:index-57--5354510338709796319.wt, cursor.insert: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-05-24T12:46:05.410+0400 I -        [conn125] Fatal Assertion 28559
2015-05-24T12:46:05.410+0400 I -        [conn122] Fatal Assertion 28559
2015-05-24T12:46:05.410+0400 I -        [conn123] Fatal Assertion 28558
2015-05-24T12:46:05.427+0400 I CONTROL  [conn122] 
 0xf51949 0xef1671 0xed6261 0xd7b5e0 0xd7617a 0xd6fce4 0xd60db0 0xa8744b 0x92555e 0x9258d5 0x9138f0 0x913ced 0xab2e5e 0xab33c8 0xab5562 0xab7e50 0x80e88d 0xf04a6b 0x7f58a58ef182 0x7f58a43b747d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B51949"},{"b":"400000","o":"AF1671"},{"b":"400000","o":"AD6261"},{"b":"400000","o":"97B5E0"},{"b":"400000","o":"97617A"},{"b":"400000","o":"96FCE4"},{"b":"400000","o":"960DB0"},{"b":"400000","o":"68744B"},{"b":"400000","o":"52555E"},{"b":"400000","o":"5258D5"},{"b":"400000","o":"5138F0"},{"b":"400000","o":"513CED"},{"b":"400000","o":"6B2E5E"},{"b":"400000","o":"6B33C8"},{"b":"400000","o":"6B5562"},{"b":"400000","o":"6B7E50"},{"b":"400000","o":"40E88D"},{"b":"400000","o":"B04A6B"},{"b":"7F58A58E7000","o":"8182"},{"b":"7F58A42BD000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105", "uname" : { "sysname" : "Linux", "release" : "3.16.0-30-generic", "version" : "#40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "F56F80CB96B4DBFC070BEB0ADAC7D6B274BFC6B1" }, { "b" : "7FFF8E5FC000", "elfType" : 3, "buildId" : "C8BA9F3BA421CFBAE75F7E57F357B1B5431DE838" }, { "b" : "7F58A58E7000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7F58A5689000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7F58A52AE000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "B927879B878D90DD9FF4B15B00E7799AA8E0272F" }, { "b" : "7F58A50A6000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F58A4EA2000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F58A4B9E000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7F58A4898000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7F58A4682000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7F58A42BD000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7F58A5B05000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf51949]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xef1671]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xed6261]
 mongod(_ZN5mongo17wtRCToStatus_slowEiPKc+0x2D0) [0xd7b5e0]
 mongod(_ZN5mongo17WiredTigerSession13releaseCursorEmP11__wt_cursor+0x1EA) [0xd7617a]
 mongod(_ZN5mongo16WiredTigerCursorD1Ev+0x14) [0xd6fce4]
 mongod(_ZN5mongo15WiredTigerIndex6insertEPNS_16OperationContextERKNS_7BSONObjERKNS_8RecordIdEb+0xD0) [0xd60db0]
 mongod(_ZN5mongo22BtreeBasedAccessMethod6insertEPNS_16OperationContextERKNS_7BSONObjERKNS_8RecordIdERKNS_19InsertDeleteOptionsEPl+0x19B) [0xa8744b]
 mongod(_ZN5mongo12IndexCatalog12_indexRecordEPNS_16OperationContextEPNS_17IndexCatalogEntryERKNS_7BSONObjERKNS_8RecordIdE+0x6E) [0x92555e]
 mongod(_ZN5mongo12IndexCatalog11indexRecordEPNS_16OperationContextERKNS_7BSONObjERKNS_8RecordIdE+0x85) [0x9258d5]
 mongod(_ZN5mongo10Collection15_insertDocumentEPNS_16OperationContextERKNS_7BSONObjEb+0xB0) [0x9138f0]
 mongod(_ZN5mongo10Collection14insertDocumentEPNS_16OperationContextERKNS_7BSONObjEb+0x8D) [0x913ced]
 mongod(_ZN5mongo14checkAndInsertEPNS_16OperationContextERNS_6Client7ContextEPKcRNS_7BSONObjE+0x13E) [0xab2e5e]
 mongod(_ZN5mongo11insertMultiEPNS_16OperationContextERNS_6Client7ContextEbPKcRSt6vectorINS_7BSONObjESaIS8_EERNS_5CurOpE+0x58) [0xab33c8]
 mongod(_ZN5mongo14receivedInsertEPNS_16OperationContextERNS_7MessageERNS_5CurOpE+0x522) [0xab5562]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x16B0) [0xab7e50]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x80e88d]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf04a6b]
 libpthread.so.0(+0x8182) [0x7f58a58ef182]
 libc.so.6(clone+0x6D) [0x7f58a43b747d]
-----  END BACKTRACE  -----
2015-05-24T12:46:05.427+0400 I -        [conn122] 
 
***aborting after fassert() failure



 Comments   
Comment by Ramon Fernandez Marina [ 29/May/15 ]

homam, if I understand correctly we may not be able to diagnose this issue any more. If you still have logs for the affected mongod you may be able to see what subsequent restarts did with respect to the affected data; you may also want to look in your system logs for possible errors from your storage subsystem, which could also have played a part in this problem.

I'm going to resolve this ticket, but if this happens again it would be very useful if you could immediately collect logs for the mongod process and save the data you're restoring for further investigation.

Regards,
Ramón.

Comment by Homam Hosseini [ 26/May/15 ]

1. No it happens on a collection with more than 900,000,000 records. The collection has indexes on 5 properties.

2. Yes it was running 3.0.0 before.

3. Unfortunately I lost the specific dump that was causing the problem. I can tell you that this crash was happening with not just one specific dump. I was able to make it working by: dumping and restoring the whole collection and then restoring the new dumps that was causing the problem. (it took almost 24 hours)

I think this problem started happening after I deleted some records from the collection. We restore ~400,000 records into this collection every half an hour, 48 times a day for the past 3 months and never had this problem before.

Comment by Ramon Fernandez Marina [ 25/May/15 ]

homam, we'll need some more information to diagnose this issue:

  1. Does this error happen on an empty instance? Or does the instance contain other data already?
  2. If this doesn't happen on an empty instance, was this instance ever running 3.0.0 before being upgraded to 3.0.3?
  3. Have you examined your system logs to see if there are any errors coming from the storage layer at the time of the crash?

It may also help for you to send us the data you're restoring and the exact mongorestore command line you're using so we can try to reproduce the problem on our end. You can upload data securely and privately via scp:

scp -P 722 -r <filename> SERVER-18641@www.mongodb.com:

where "<filename>" is a tar/zip archive of your data. When prompted for a password just press enter.

Thanks,
Ramón.

Generated at Thu Feb 08 03:48:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.