[SERVER-17141] session->truncate failed with File Not Found in WiredTigerRecordStoreThread for local.oplog.rs Created: 31/Jan/15  Updated: 23/Sep/19  Resolved: 24/Feb/15

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 3.0.0-rc7
Fix Version/s: 3.0.0-rc10, 3.1.0

Type: Bug Priority: Critical - P2
Reporter: Michael Grundy Assignee: Keith Bostic (Inactive)
Resolution: Done Votes: 1
Labels: 28qa, todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File commandline     Java Source File jmongoiibench.java     File mongo.conf     File run.2.simple.bash     File run.simple.bash    
Issue Links:
Duplicate
is duplicated by SERVER-17288 Invariant failure during agg_sort FSM... Closed
Related
related to SERVER-43457 Complete TODO listed in SERVER-17141 Closed
is related to SERVER-17145 4 parallel imports on each of the 3 m... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

Using an r3.4xlarge instance (16 cpu, 122g ram, ephemeral storage for data) the attached mongo.conf (modded for dbpath and logpath), [https://github.com/tmcallaghan/iibench-mongodb|iibench] with the attached changes and shell script, and mongod run as follows:

numactl --interleave=all ./mongodb-linux-x86_64-3.0.0-rc8/bin/mongod --config mongo.conf --storageEngine wiredTiger --wiredTigerCacheSizeGB 70 --master

Workload is run on same machine. bash run.2.simple.bash 1000000

jdk-7u25, mongo-java-driver-2.12.5.jar.

Participants:

 Description   

Running insert workload, primary failed about 12 hours in.

2015-01-31T05:26:24.818+0000 I COMMAND  [conn47] command socialite.$cmd command: insert { insert: "followers", ordered: true, documents: [ { _id: ObjectId('54cc6780e4b0aa9307e5a96e'), _f: "1036285", _t: "2457926" } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:80 115ms
2015-01-31T05:26:25.491+0000 I QUERY    [conn689] getmore local.oplog.rs cursorid:23299506973 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:257 nreturned:32969 reslen:4194377 200ms
2015-01-31T05:26:28.257+0000 I -        [WiredTigerRecordStoreThread for local.oplog.rs] Invariant failure: session->truncate(session, NULL, start, c, NULL) resulted in status UnknownError 2: No such file or directory at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 517
2015-01-31T05:26:28.273+0000 I QUERY    [conn690] getmore local.oplog.rs cursorid:22034715965 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:257 nreturned:32967 reslen:4194299 205ms
2015-01-31T05:26:28.416+0000 I CONTROL  [WiredTigerRecordStoreThread for local.oplog.rs] 
 0xf3a9d9 0xee4b21 0xecaafa 0xd563fa 0xd5b4a4 0xecbbc0 0xf88114 0x7f7a624a2f18 0x7f7a615b4b9d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B3A9D9"},{"b":"400000","o":"AE4B21"},{"b":"400000","o":"ACAAFA"},{"b":"400000","o":"9563FA"},{"b":"400000","o":"95B4A4"},{"b":"400000","o":"ACBBC0"},{"b":"400000","o":"B88114"},{"b":"7F7A6249B000","o":"7F18"},{"b":"7F7A614D2000","o":"E2B9D"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc7", "gitVersion" : "e4c60053b2967e16f765fa25d16aa6d629faa196", "uname" : { "sysname" : "Linux", "release" : "3.14.20-20.44.amzn1.x86_64", "version" : "#1 SMP Mon Oct 6 22:52:46 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFC7EFE000", "elfType" : 3 }, { "b" : "7F7A6249B000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "7F7A62293000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "7F7A6208F000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "7F7A61D8B000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F7A61A8D000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "7F7A61877000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F7A614D2000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "7F7A626B7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf3a9d9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xee4b21]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xecaafa]
 mongod(_ZN5mongo21WiredTigerRecordStore27cappedDeleteAsNeeded_inlockEPNS_16OperationContextERKNS_8RecordIdE+0x78A) [0xd563fa]
 mongod(+0x95B4A4) [0xd5b4a4]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x120) [0xecbbc0]
 mongod(+0xB88114) [0xf88114]
 libpthread.so.0(+0x7F18) [0x7f7a624a2f18]
 libc.so.6(clone+0x6D) [0x7f7a615b4b9d]
-----  END BACKTRACE  -----
2015-01-31T05:26:28.422+0000 I -        [WiredTigerRecordStoreThread for local.oplog.rs] 
 
***aborting after invariant() failure



 Comments   
Comment by Githook User [ 19/Feb/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-17141 Cope with WT_NOTFOUND from WT_SESSION::truncate

(cherry picked from commit 4ca4224dae91ec067a38cd552b3b8c50999d7707)
Branch: v3.0
https://github.com/mongodb/mongo/commit/74aabd913f3cafb2fd3adbafb2e7ec3d5aa8597e

Comment by Githook User [ 19/Feb/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-17141 Cope with WT_NOTFOUND from WT_SESSION::truncate
Branch: master
https://github.com/mongodb/mongo/commit/4ca4224dae91ec067a38cd552b3b8c50999d7707

Comment by Githook User [ 17/Feb/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17141 Don't abort if ENOENT is returned from session->truncate()

(cherry picked from commit d47529bb1ad51e65f5262ace4efcaff96062e028)
Branch: v3.0
https://github.com/mongodb/mongo/commit/5189c2d2e386c28b56ce8ae6ba630d98abee25bb

Comment by Githook User [ 14/Feb/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17141 Don't abort if ENOENT is returned from session->truncate()
Branch: master
https://github.com/mongodb/mongo/commit/d47529bb1ad51e65f5262ace4efcaff96062e028

Comment by Michael Grundy [ 12/Feb/15 ]

I'm able to reproduce it now, takes about 1800 seconds longer. BT from rc8 below.

2015-02-12T20:23:00.212+0000 I -        [WiredTigerRecordStoreThread for local.oplog.$main] Invariant failure: session->truncate(session, NULL, start, c, NULL) resulted in status UnknownError 2: No such file or directory at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 512
2015-02-12T20:23:00.218+0000 I CONTROL  [WiredTigerRecordStoreThread for local.oplog.$main] 
 0xf3f059 0xee9191 0xecf16a 0xd59c9a 0xd5f4b9 0xed0230 0xf8c794 0x7f6aefb1af18 0x7f6aeec2cb2d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B3F059"},{"b":"400000","o":"AE9191"},{"b":"400000","o":"ACF16A"},{"b":"400000","o":"959C9A"},{"b":"400000","o":"95F4B9"},{"b":"400000","o":"AD0230"},{"b":"400000","o":"B8C794"},{"b":"7F6AEFB13000","o":"7F18"},{"b":"7F6AEEB4A000","o":"E2B2D"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc8", "gitVersion" : "9d0714cdcffadb203ff68730acedbc40875403da", "uname" : { "sysname" : "Linux", "release" : "3.14.27-25.47.amzn1.x86_64", "version" : "#1 SMP Wed Dec 17 18:36:15 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF316FE000", "elfType" : 3 }, { "b" : "7F6AEFB13000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "7F6AEF90B000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "7F6AEF707000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "7F6AEF403000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F6AEF105000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "7F6AEEEEF000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F6AEEB4A000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "7F6AEFD2F000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf3f059]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xee9191]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xecf16a]
 mongod(_ZN5mongo21WiredTigerRecordStore27cappedDeleteAsNeeded_inlockEPNS_16OperationContextERKNS_8RecordIdE+0x78A) [0xd59c9a]
 mongod(+0x95F4B9) [0xd5f4b9]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x120) [0xed0230]
 mongod(+0xB8C794) [0xf8c794]
 libpthread.so.0(+0x7F18) [0x7f6aefb1af18]
 libc.so.6(clone+0x6D) [0x7f6aeec2cb2d]
-----  END BACKTRACE  -----
2015-02-12T20:23:00.218+0000 I -        [WiredTigerRecordStoreThread for local.oplog.$main] 
 
***aborting after invariant() failure

Comment by Mark Callaghan [ 10/Feb/15 ]

Crash repeated after 3400 seconds. zlib still OK after 7000 seconds

Comment by Mark Callaghan [ 10/Feb/15 ]

Put jmongoiibench.java in the "src" directory
Run the client as "bash run.simple.bash 1000000"

Comment by Michael Grundy [ 10/Feb/15 ]

Hi Mark - Could you put in your full mongod options, and the iibench option changes compared to the run.simple.bash script. I want to make sure I don't assume anything while I try to repro this.

Thanks!
Mike

Comment by Mark Callaghan [ 10/Feb/15 ]

I get this in rc8 after about 15 minutes. The workload is iibench-mongodb with 10 loader threads each doing 1 document/insert, 0 query threads and a 2000MB oplog. Let me know if you want more details. This occurs with snappy compression. The test with zlib compression has made it almost to 30 minutes without a crash.

2015-02-10T09:11:20.181-0800 I -        [WiredTigerRecordStoreThread for local.oplog.$main] Invariant failure: session->truncate(session, NULL, start, c, NULL) resulted in status UnknownError 2: No such file or directory at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 512
2015-02-10T09:11:20.184-0800 I CONTROL  [WiredTigerRecordStoreThread for local.oplog.$main] 
...
 0x11bf28b 0x1158b21 0x113c258 0xf6b08f 0xf714bf 0x113d6be 0x120c378 0x7f7205ca17c9 0x7f7204da08ad 0
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"DBF28B"},{"b":"400000","o":"D58B21"},{"b":"400000","o":"D3C258"},{"b":"400000","o":"B6B08F"},{"b":"400000","o":"B714BF"},{"b":"400000","o":"D3D6BE"},{"b":"400000","o":"E0C378"},{"b":"7F7205C9A000","o":"77C9"},{"b":"7F7204C94000","o":"10C8AD"},{"b":"0","o":"0"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc8", "gitVersion" : "nogitversion", "uname" : { "sysname" : "Linux", "release" : "3.10.53-43_fbk14_02627_gb5c286c", "version" : "#1 SMP Mon Oct 20 15:06:57 PDT 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "41AAD39CBBE1BF058375E66675F900D788D83C6E" }, { "b" : "7FFF234FE000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "50103D521098AC1A8DA637FE9F84CCFE48B62B25" }, { "b" : "7F72060CA000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libz.so.1", "elfType" : 3, "buildId" : "530837A99C2C4AB20E681CC0E635F12E52960C18" }, { "b" : "7F7205EB7000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libbz2.so.1", "elfType" : 3, "buildId" : "A22507242981027D02344CDB142727E9BFE04AC9" }, { "b" : "7F7205C9A000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libpthread.so.0", "elfType" : 3, "buildId" : "F3CF4B6D065A6FCF55D752C1B7FBF951C041B871" }, { "b" : "7F7205A91000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/librt.so.1", "elfType" : 3, "buildId" : "36E417C497D7D99E323324B54709FFF95E21F209" }, { "b" : "7F720588D000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libdl.so.2", "elfType" : 3, "buildId" : "7258590522FF8998FF3DC9CD4800354FF634B049" }, { "b" : "7F720557E000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libstdc++.so.6", "elfType" : 3, "buildId" : "E3F55DBA2D7BEC8AADA472747C8950045FDAE8C8" }, { "b" : "7F7205277000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libm.so.6", "elfType" : 3, "buildId" : "5180B0D6214EFC6BD4024BCC46459868BB9FF312" }, { "b" : "7F7205060000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libgcc_s.so.1", "elfType" : 3, "buildId" : "2DF21D541E1764CA1D95BBE8DC5615C57BA072FE" }, { "b" : "7F7204C94000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libc.so.6", "elfType" : 3, "buildId" : "C645843801A68480D80BF9D0FD4A20182CBD1D9C" }, { "b" : "7F72062E4000", "path" : "/usr/local/fbcode/gcc-4.9-glibc-2.20/lib/ld.so", "elfType" : 3, "buildId" : "FAF6C327CBADE1769B09F51830B64480E9FFDC82" } ] }}
 mongod.30rc8.jem.70g.zlib(_ZN5mongo15printStackTraceERSo+0x2B) [0x11bf28b]
 mongod.30rc8.jem.70g.zlib(_ZN5mongo10logContextEPKc+0xE1) [0x1158b21]
 mongod.30rc8.jem.70g.zlib(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x108) [0x113c258]
 mongod.30rc8.jem.70g.zlib(_ZN5mongo21WiredTigerRecordStore27cappedDeleteAsNeeded_inlockEPNS_16OperationContextERKNS_8RecordIdE+0x89F) [0xf6b08f]
 mongod.30rc8.jem.70g.zlib(+0xB714BF) [0xf714bf]
 mongod.30rc8.jem.70g.zlib(_ZN5mongo13BackgroundJob7jobBodyEv+0x11E) [0x113d6be]
 mongod.30rc8.jem.70g.zlib(+0xE0C378) [0x120c378]
 libpthread.so.0(+0x77C9) [0x7f7205ca17c9]
 libc.so.6(clone+0x6D) [0x7f7204da08ad]
 ??? [0]
-----  END BACKTRACE  -----
2015-02-10T09:11:20.184-0800 I -        [WiredTigerRecordStoreThread for local.oplog.$main] 
 
***aborting after invariant() failure

Comment by Daniel Pasette (Inactive) [ 31/Jan/15 ]

michael.grundy@10gen.com, do you have any other system info to help track this?

  • Where do I get the workload?
  • What is the configuration and startup params for the servers?
  • System details?
  • Can you include the stats on the collection[s] and oplog?
  • Any other run time stats being collected?
Generated at Thu Feb 08 03:43:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.