[SERVER-17508] WT_NOTFOUND on capped collection workload Created: 09/Mar/15  Updated: 14/Apr/16  Resolved: 10/Mar/15

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 3.0.0
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Asya Kamsky Assignee: Michael Cahill (Inactive)
Resolution: Duplicate Votes: 1
Labels: 3.0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-17345 WiredTiger -> session.truncate: the s... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

See https://groups.google.com/forum/?#!msg/mongodb-user/AgV_18rR2UQ/lb52Tha9er4J for detailed description and log files.

My use case involves 150+ consumers pulling data from two queues and making inserts into two other large collections, as adding data to a capped collection where I log the jobs. I'm using Ubuntu 12.04 and have attached my logs here. Any help would be truly appreciated.

If I can't figure this out sometime soon I'm afraid I'll be forced to return to mongo 2.6. I never encountered the entire database crashing and having to be manually restarted every couple hours before.

2015-03-08T04:15:22.103-0400 I CONTROL  [conn3836] 
 0xf3f529 0xedfd81 0xec6d4a 0xd5b7d9 0xd5f5a5 0xd5fbce 0x910dae 0x91158c 0x9b0f53 0x9b1b5a 0x9b1c24 0x9b2335 0x9b4d8d 0x9d5b54 0x9d6a93 0x9d768b 0xb98135 0xab05f9 0x80d390 0xef36db 0x7f5e70cd3182 0x7f5e6f79bfbd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B3F529"},{"b":"400000","o":"ADFD81"},{"b":"400000","o":"AC6D4A"},{"b":"400000","o":"95B7D9"},{"b":"400000","o":"95F5A5"},{"b":"400000","o":"95FBCE"},{"b":"400000","o":"510DAE"},{"b":"400000","o":"51158C"},{"b":"400000","o":"5B0F53"},{"b":"400000","o":"5B1B5A"},{"b":"400000","o":"5B1C24"},{"b":"400000","o":"5B2335"},{"b":"400000","o":"5B4D8D"},{"b":"400000","o":"5D5B54"},{"b":"400000","o":"5D6A93"},{"b":"400000","o":"5D768B"},{"b":"400000","o":"798135"},{"b":"400000","o":"6B05F9"},{"b":"400000","o":"40D390"},{"b":"400000","o":"AF36DB"},{"b":"7F5E70CCB000","o":"8182"},{"b":"7F5E6F6A1000","o":"FAFBD"}],"processInfo":{ "mongodbVersion" : "3.0.0", "gitVersion" : "a841fd6394365954886924a35076691b4d149168", "uname" : { "sysname" : "Linux", "release" : "3.13.0-35-generic", "version" : "#62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "5F365548FE8312E027817D98FEECA3A18F0F60FB" }, { "b" : "7FFF19A7B000", "elfType" : 3, "buildId" : "B25EDEA74063E2308FE1BF3608006A9E3D860BA9" }, { "b" : "7F5E70CCB000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "FE662C4D7B14EE804E0C1902FB55218A106BC5CB" }, { "b" : "7F5E70A6D000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7F5E70693000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "379F80D2768BA6A21F52781895EE9F47B34A0A85" }, { "b" : "7F5E7048B000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F5E70287000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F5E6FF83000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7F5E6FC7D000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "574C6350381DA194C00FF555E0C1784618C05569" }, { "b" : "7F5E6FA67000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "CC0D578C2E0D86237CA7B0CE8913261C506A629A" }, { "b" : "7F5E6F6A1000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B515361E474796AF29DE9992B76A97CFFB39B2A7" }, { "b" : "7F5E70EE9000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf3f529]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xedfd81]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xec6d4a]
 mongod(_ZN5mongo21WiredTigerRecordStore27cappedDeleteAsNeeded_inlockEPNS_16OperationContextERKNS_8RecordIdE+0x789) [0xd5b7d9]
 mongod(_ZN5mongo21WiredTigerRecordStore20cappedDeleteAsNeededEPNS_16OperationContextERKNS_8RecordIdE+0x55) [0xd5f5a5]
 mongod(_ZN5mongo21WiredTigerRecordStore12insertRecordEPNS_16OperationContextEPKcib+0x1EE) [0xd5fbce]
 mongod(_ZN5mongo10Collection15_insertDocumentEPNS_16OperationContextERKNS_7BSONObjEb+0x4E) [0x910dae]
 mongod(_ZN5mongo10Collection14insertDocumentEPNS_16OperationContextERKNS_7BSONObjEb+0x8C) [0x91158c]
 mongod(_ZN5mongo18WriteBatchExecutor13execOneInsertEPNS0_16ExecInsertsStateEPPNS_16WriteErrorDetailE+0xA93) [0x9b0f53]
 mongod(_ZN5mongo18WriteBatchExecutor11execInsertsERKNS_21BatchedCommandRequestEPSt6vectorIPNS_16WriteErrorDetailESaIS6_EE+0x27A) [0x9b1b5a]
 mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x34) [0x9b1c24]
 mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x3A5) [0x9b2335]
 mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x15D) [0x9b4d8d]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9d5b54]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9d6a93]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9d768b]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_b+0x755) [0xb98135]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xB19) [0xab05f9]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x80d390]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x32B) [0xef36db]
 libpthread.so.0(+0x8182) [0x7f5e70cd3182]
 libc.so.6(clone+0x6D) [0x7f5e6f79bfbd]
-----  END BACKTRACE  -----
2015-03-08T04:15:22.103-0400 I -        [conn3836] 
***aborting after invariant() failure



 Comments   
Comment by Daniel Pasette (Inactive) [ 10/Mar/15 ]

duplicates SERVER-17345

Comment by Keith Bostic (Inactive) [ 09/Mar/15 ]

The cursor positioning problem we were seeing was SERVER-17345, and the reproducible case we found was failing at the session->truncate call 5 lines further down:

 
WiredTigerCursor startWrap( _uri, _instanceId, true, txn);
WT_CURSOR* start = startWrap.get();
ret = WT_OP_CHECK(start->next(start));
invariantWTOK(ret);
 
ret = session->truncate(session, NULL, start, c, NULL);         <<< SERVER-17345 was failing here
if (ret == ENOENT || ret == WT_NOTFOUND) {
    // TODO we should remove this case once SERVER-17141 is resolved
    log() << "Soft failure truncating capped collection. Will try again later.";
          docsRemoved = 0;
 }

The cursor positioning problem we fixed could cause the start cursor to be mispositioned – I never saw it return WT_NOTFOUND from the initial cursor->next call, but the cursor was absolutely being positioned in the wrong place, so it's not unreasonable to think it could have been positioned at the end of the file in some cases, returning WT_NOTFOUND.

Comment by Asya Kamsky [ 09/Mar/15 ]

db version v3.0.0
git version: a841fd6394365954886924a35076691b4d149168
OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
build info: Linux ip-10-179-177-12 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49

Generated at Thu Feb 08 03:44:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.