[SERVER-17098] wiredTiger read checksum error Created: 28/Jan/15  Updated: 11/Mar/15  Resolved: 24/Feb/15

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.0.0-rc7
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Quentin Conner Assignee: Michael Cahill (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6.5


Attachments: Text File mongodb-sysbench.log    
Operating System: Linux
Steps To Reproduce:

Run sysbench against mongod on the same host (CentOS 6.5) in a single-node topology with Java driver ver 2.12.4.

Participants:

 Description   

With single-node sysbench workload running insert, mongod 3.0.0-rc7 from MCI encountered a wiredTiger library panic.

 118ms
2015-01-28T19:27:45.207+0000 E STORAGE  WiredTiger (0) [1422473265:207337][21989:0x7f0ee4e0a700], file:WiredTiger.wt, session.checkpoint: read checksum error [4096B @ 36864, 2688513059 != 1
97662958]
2015-01-28T19:27:45.207+0000 E STORAGE  WiredTiger (0) [1422473265:207393][21989:0x7f0ee4e0a700], file:WiredTiger.wt, session.checkpoint: WiredTiger.wt: encountered an illegal file format o
r internal value
2015-01-28T19:27:45.207+0000 E STORAGE  WiredTiger (-31804) [1422473265:207407][21989:0x7f0ee4e0a700], file:WiredTiger.wt, session.checkpoint: the process must exit and restart: WT_PANIC: W
iredTiger library panic
2015-01-28T19:27:45.207+0000 I -        [conn4] Fatal Assertion 28559
2015-01-28T19:27:45.207+0000 I -        [conn6] Fatal Assertion 28559
2015-01-28T19:27:45.207+0000 I -        [conn9] Fatal Assertion 28559
2015-01-28T19:27:45.207+0000 I -        [conn5] Fatal Assertion 28559
2015-01-28T19:27:45.207+0000 I -        [conn2] Fatal Assertion 28559
2015-01-28T19:27:45.207+0000 I -        [conn3] Fatal Assertion 28559
2015-01-28T19:27:45.207+0000 I -        [conn7] Fatal Assertion 28559
2015-01-28T19:27:45.207+0000 I -        Fatal Assertion 28558
2015-01-28T19:27:45.207+0000 I -        [conn8] Fatal Assertion 28559
2015-01-28T19:27:45.217+0000 I CONTROL  [conn4]
 0xf3a9d9 0xee4b21 0xec8f41 0xd67729 0xd62b3a 0xd5bfd4 0xd58296 0x8f22de 0x8f2aac 0x993393 0x993f7a 0x994054 0x994755 0x99737d 0x9b71a4 0x9b80e3 0x9b8cdb 0xb87f95 0xa99f88 0x7e6730 0xef8aab 0x7f0eeede89d1 0x7f0eedf8986d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B3A9D9"},{"b":"400000","o":"AE4B21"},{"b":"400000","o":"AC8F41"},{"b":"400000","o":"967729"},{"b":"400000","o":"962B3A"},{"b":"400000","o":"95BFD4"},{"b":"400000","o":"958296"},{"b":"400000","o":"4F22DE"},{"b":"400000","o":"4F2AAC"},{"b":"400000","o":"593393"},{"b":"400000","o":"593F7A"},{"b":"400000","o":"594054"},{"b":"400000","o":"594755"},{"b":"400000","o":"59737D"},{"b":"400000","o":"5B71A4"},{"b":"400000","o":"5B80E3"},{"b":"400000","o":"5B8CDB"},{"b":"400000","o":"787F95"},{"b":"400000","o":"699F88"},{"b":"400000","o":"3E6730"},{"b":"400000","o":"AF8AAB"},{"b":"7F0EEEDE1000","o":"79D1"},{"b":"7F0EEDEA1000","o":"E886D"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc7", "gitVersion" : "e4c60053b2967e16f765fa25d16aa6d629faa196", "uname" : { "sysname" : "Linux", "release" : "2.6.32-431.el6.x86_64", "version" : "#1 SMP Fri Nov 22 03:15:09 UTC 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFC1C5C000", "elfType" : 3 }, { "b" : "7F0EEEDE1000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "7F0EEEBD9000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "7F0EEE9D5000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "7F0EEE6CF000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F0EEE44B000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "7F0EEE235000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F0EEDEA1000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "7F0EEEFFE000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf3a9d9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xee4b21]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xec8f41]
 mongod(_ZN5mongo17wtRCToStatus_slowEiPKc+0x309) [0xd67729]
 mongod(_ZN5mongo17WiredTigerSession13releaseCursorEmP11__wt_cursor+0x1EA) [0xd62b3a]
 mongod(_ZN5mongo16WiredTigerCursorD1Ev+0x14) [0xd5bfd4]
 mongod(_ZN5mongo21WiredTigerRecordStore12insertRecordEPNS_16OperationContextEPKcib+0x126) [0xd58296]
 mongod(_ZN5mongo10Collection15_insertDocumentEPNS_16OperationContextERKNS_7BSONObjEb+0x4E) [0x8f22de]
 mongod(_ZN5mongo10Collection14insertDocumentEPNS_16OperationContextERKNS_7BSONObjEb+0x8C) [0x8f2aac]
 mongod(_ZN5mongo18WriteBatchExecutor13execOneInsertEPNS0_16ExecInsertsStateEPPNS_16WriteErrorDetailE+0xA93) [0x993393]
 mongod(_ZN5mongo18WriteBatchExecutor11execInsertsERKNS_21BatchedCommandRequestEPSt6vectorIPNS_16WriteErrorDetailESaIS6_EE+0x25A) [0x993f7a]
 mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x34) [0x994054]
 mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x395) [0x994755]
 mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x15D) [0x99737d]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9b71a4]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9b80e3]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9b8cdb]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_b+0x755) [0xb87f95]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xAC8) [0xa99f88]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e6730]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x32B) [0xef8aab]
 libpthread.so.0(+0x79D1) [0x7f0eeede89d1]
 libc.so.6(clone+0x6D) [0x7f0eedf8986d]
-----  END BACKTRACE  -----
2015-01-28T19:27:45.217+0000 I -        [conn4]
 
***aborting after fassert() failure



 Comments   
Comment by Michael Cahill (Inactive) [ 24/Feb/15 ]

The original issue cannot be reproduced. Please open a new ticket if it happens again.

quentin.conner, we have some open tickets around memory and pauses: if you can reproduce them in RC9 and above and there are no matching tickets, please open new ones.

Comment by Quentin Conner [ 10/Feb/15 ]

Could not reproduce the WT panic upon re-execution of the same automated test immediately after reboot.

Comment by Quentin Conner [ 05/Feb/15 ]

The same sysbench workload was run against the same machine successfully (data load), then run in execute mode for six days without error. This is on a rotating magnetic hard disk. Not able to reproduce this symptom yet.

Comment by Michael Cahill (Inactive) [ 02/Feb/15 ]

Let me know if you can reproduce this: at the moment I don't have enough information to make progress.

Just for clarity, this kind of error would be generated if files in the database directory were modified by some other process while mongod is running. For example, if a backup was restored over the database directory while mongod was performing updates, an error like this could be generated.

To my knowledge, we haven't seen this error from any of the workloads that we have run against mongod without some kind of external corruption, and sysbench isn't doing anything much different to the tests we're running continuously.

Comment by Quentin Conner [ 29/Jan/15 ]

Nothing noteworthy. Fresh boot of the machine. EXT4 filesystem that had been in use for RC6 testing previously.

It was a clean database in that it did not exist (no files on disk) at the begin of the run.

It will be a few days to see if it reproduces. The machine is busy with another workload.

Comment by Michael Cahill (Inactive) [ 29/Jan/15 ]

Was this with a clean database at the beginning of the run? Is it reproducible?

To my knowledge, we haven't seen anything like this and it is a frequent code path in WiredTiger. Is there anything unusual about the host (filesystem, storage subsystem, etc.)?

Comment by Quentin Conner [ 28/Jan/15 ]

mongod log file attached as mongodb-sysbench.log

Generated at Thu Feb 08 03:43:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.