[SERVER-17378] WiredTiger's compact code can return 'Operation timed out' error (invariant failure) Created: 25/Feb/15  Updated: 18/Sep/15  Resolved: 05/Mar/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.0-rc10
Fix Version/s: 3.0.1, 3.1.0

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: Daniel Pasette (Inactive)
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   

WiredTiger's compact code will return an 'Operation timed out' error if the compact operation exceeds 'max_time'. This timeout can seemingly occur under two conditions:

1 - The WT compact operation takes longer than 1200 seconds (20 minutes)
2 - The system clock goes backward during the compact operation

When the timeout occurs, the error propagates up and leads to an invariant failure:

Breakpoint 1, __session_compact_check_timeout (session=0x35b4c40, begin=...) at src/third_party/wiredtiger/src/session/session_compact.c:138
(gdb) l
133			return (0);
134	
135		WT_RET(__wt_epoch(session, &end));
136		if (session->compact->max_time <
137		    WT_TIMEDIFF(end, begin) / WT_BILLION)
138			WT_RET(ETIMEDOUT);
139		return (0);
140	}
141	
142	/*
 
==> WT_TIMEDIFF(end, begin) / WT_BILLION) calculation when the clock goes backward
(gdb) p (1000000000 * (uint64_t)((end).tv_sec - (begin).tv_sec) + (uint64_t)(end).tv_nsec - (uint64_t)(begin).tv_nsec) / 1000000000
$4 = 18446744010
(gdb) p session->compact->max_time
$5 = 1200
(gdb) c
 
Invariant failure: ret resulted in status UnknownError 110: Connection timed out at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 730
 
(gdb) bt
#0  0x00007ffff7bcc20b in raise (sig=5) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x00000000018f7874 in mongo::breakpoint () at src/mongo/util/debugger.cpp:63
#2  0x00000000018ed771 in mongo::invariantOKFailed (expr=0x2062531 "ret", status=..., file=0x2062420 "src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp", line=730) at src/mongo/util/assert_util.cpp:157
#3  0x000000000176f0f4 in mongo::WiredTigerRecordStore::compact (this=0x4a2cd00, txn=0x7ffff10f47d0, adaptor=0x0, options=0x7ffff10f2e40, stats=0x7ffff10f2ca0) at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp:730
#4  0x0000000001282bc6 in mongo::Collection::compact (this=0x5177800, txn=0x7ffff10f47d0, compactOptions=0x7ffff10f2e40) at src/mongo/db/catalog/collection_compact.cpp:118
#5  0x00000000012c916c in mongo::CompactCmd::run (this=0x28c0d40 <mongo::compactCmd>, txn=0x7ffff10f47d0, db=..., cmdObj=..., errmsg=..., result=..., fromRepl=false) at src/mongo/db/commands/compact.cpp:168
#6  0x000000000135b619 in mongo::_execCommand (txn=0x7ffff10f47d0, c=0x28c0d40 <mongo::compactCmd>, dbname=..., cmdObj=..., queryOptions=0, errmsg=..., result=..., fromRepl=false) at src/mongo/db/dbcommands.cpp:1316
#7  0x000000000135c5e7 in mongo::Command::execCommand (txn=0x7ffff10f47d0, c=0x28c0d40 <mongo::compactCmd>, queryOptions=0, cmdns=0x3449414 "db1.$cmd", cmdObj=..., result=..., fromRepl=false) at src/mongo/db/dbcommands.cpp:1532
#8  0x000000000135ced8 in mongo::_runCommands (txn=0x7ffff10f47d0, ns=0x3449414 "db1.$cmd", _cmdobj=..., b=..., anObjBuilder=..., fromRepl=false, queryOptions=0) at src/mongo/db/dbcommands.cpp:1604
#9  0x0000000001563e58 in mongo::runCommands (txn=0x7ffff10f47d0, ns=0x3449414 "db1.$cmd", jsobj=..., curop=..., b=..., anObjBuilder=..., fromRepl=false, queryOptions=0) at src/mongo/db/query/find.cpp:137
#10 0x0000000001565e8a in mongo::runQuery (txn=0x7ffff10f47d0, m=..., q=..., nss=..., curop=..., result=..., fromDBDirectClient=false) at src/mongo/db/query/find.cpp:606
#11 0x000000000146a7d0 in mongo::receivedQuery (txn=0x7ffff10f47d0, c=..., dbresponse=..., m=..., fromDBDirectClient=false) at src/mongo/db/instance.cpp:220
#12 0x000000000146b97d in mongo::assembleResponse (txn=0x7ffff10f47d0, m=..., dbresponse=..., remote=..., fromDBDirectClient=false) at src/mongo/db/instance.cpp:403
#13 0x000000000115d604 in mongo::MyMessageHandler::process (this=0x310a400, m=..., port=0x344f3b0, le=0x50668a0) at src/mongo/db/db.cpp:206
#14 0x0000000001915cb6 in mongo::PortMessageServer::handleIncomingMsg (arg=0x344f3b0) at src/mongo/util/net/message_server_port.cpp:229
#15 0x00007ffff7bc4182 in start_thread (arg=0x7ffff10f5700) at pthread_create.c:312
#16 0x00007ffff6cc500d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 



 Comments   
Comment by Githook User [ 05/Mar/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-17378 pass an explicit no timeout to compact in WiredTiger

(cherry picked from commit ac525a4566a46b6b79a8465903b2ab0391eab506)
Branch: v3.0
https://github.com/mongodb/mongo/commit/8e0034e8260fbf82e0a63e8159ecb17b58b498d7

Comment by Githook User [ 05/Mar/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-17378 pass an explicit no timeout to compact in WiredTiger
Branch: master
https://github.com/mongodb/mongo/commit/ac525a4566a46b6b79a8465903b2ab0391eab506

Generated at Thu Feb 08 03:44:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.