[SERVER-11037] Interrupting repairDatabase can leak temporary collections Created: 04/Oct/13  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Storage
Affects Version/s: 2.5.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:
Case:

 Description   

The repairDatabase command clones the target database into a temporary directory, and then moves the data files from that directory into their final location. If the repairDatabase command is interrupted with killOp during the clone, the generated temporary directory is never removed (even after restart), and consumes disk space.

Reproduce by interrupting the clone, while between collections. The interruption point is mayInterrupt(). repairDatabase explicitly allows interruption at all mayInterrupt() calls with heedMutex=false.

Sample session (killOp not shown):

> db.repairDatabase()
{
	"errmsg" : "exception: operation was interrupted",
	"code" : 11601,
	"ok" : 0
}
> db.repairDatabase()
{
	"errmsg" : "exception: operation was interrupted",
	"code" : 11601,
	"ok" : 0
}
> db.repairDatabase()
{
	"errmsg" : "exception: operation was interrupted",
	"code" : 11601,
	"ok" : 0
}
>
[1]+  Stopped                 mongo
rassi@laptop:~/work/mongo $ du -hs /data/db/_tmp_repairDatabase_*
2.0G	/data/db/_tmp_repairDatabase_0
2.0G	/data/db/_tmp_repairDatabase_1
2.0G	/data/db/_tmp_repairDatabase_2

Stack trace for above, at interruption:

#0  mongo::KillCurrentOp::checkForInterrupt (this=0x101d291c0, heedMutex=false) at kill_current_op.cpp:141
#1  0x000000010025277e in mongo::mayInterrupt (mayBeInterrupted=true) at cloner.cpp:64
#2  0x0000000100255a40 in mongo::Cloner::go (this=0x1067a3b38, masterHost=0x1060df6f8 "localhost:27017", opts=@0x1067a3a38, clonedColls=@0x1067a3a08, errmsg=@0x1067a54c8, errCode=0x0) at cloner.cpp:439
#3  0x0000000100257fd5 in mongo::Cloner::go (this=0x1067a3b38, masterHost=0x1060df6f8 "localhost:27017", errmsg=@0x1067a54c8, fromdb=@0x1067a4828, logForRepl=false, slaveOk=false, useReplAuth=false, snapshot=false, mayYield=false, mayBeInterrupted=true, errCode=0x0) at cloner.cpp:326
#4  0x0000000100258210 in mongo::Cloner::cloneFrom (masterHost=0x1060df6f8 "localhost:27017", errmsg=@0x1067a54c8, fromdb=@0x1067a4828, logForReplication=false, slaveOk=false, useReplAuth=false, snapshot=false, mayYield=false, mayBeInterrupted=true, errCode=0x0) at cloner.cpp:520
#5  0x0000000100632d02 in mongo::repairDatabase (dbNameS=@0x1067a4d28, errmsg=@0x1067a54c8, preserveClonedFilesOnFailure=false, backupOriginalFiles=false) at pdfile.cpp:1407
#6  0x000000010034b066 in mongo::CmdRepairDatabase::run (this=0x101d28470, dbname=@0x1067a54f8, cmdObj=@0x1067a5aa0, unnamed_arg=0, errmsg=@0x1067a54c8, result=@0x1067a60d8, fromRepl=false) at dbcommands.cpp:366
#7  0x0000000100341766 in mongo::_execCommand (c=0x101d28470, dbname=@0x1067a54f8, cmdObj=@0x1067a5aa0, queryOptions=0, errmsg=@0x1067a54c8, result=@0x1067a60d8, fromRepl=false) at dbcommands.cpp:1963
#8  0x0000000100343efc in mongo::Command::execCommand (c=0x101d28470, client=@0x106186680, queryOptions=0, cmdns=0x1061f0014 "test.$cmd", cmdObj=@0x1067a5aa0, result=@0x1067a60d8, fromRepl=false) at dbcommands.cpp:2130
#9  0x00000001003452af in mongo::_runCommands (ns=0x1061f0014 "test.$cmd", _cmdobj=@0x1067a61c0, b=@0x1067a6138, anObjBuilder=@0x1067a60d8, fromRepl=false, queryOptions=0) at dbcommands.cpp:2194
#10 0x00000001006141d5 in mongo::runCommands (ns=0x1061f0014 "test.$cmd", jsobj=@0x1067a61c0, curop=@0x1060f3180, b=@0x1067a6138, anObjBuilder=@0x1067a60d8, fromRepl=false, queryOptions=0) at query.cpp:68
#11 0x0000000100614fba in mongo::runQuery (m=@0x1067a7990, q=@0x1067a6ae0, curop=@0x1060f3180, result=@0x106190160) at query.cpp:1045
#12 0x000000010054627b in receivedQuery (c=@0x106186680, dbresponse=@0x1067a7640, m=@0x1067a7990) at instance.cpp:280
#13 0x000000010054a349 in mongo::assembleResponse (m=@0x1067a7990, dbresponse=@0x1067a7640, remote=@0x1067a7690) at instance.cpp:443
#14 0x0000000100019f69 in mongo::MyMessageHandler::process (this=0x10608c0e8, m=@0x1067a7990, port=0x10609a1e0, le=0x1060a7940) at db.cpp:221
#15 0x0000000100bdfd2e in mongo::PortMessageServer::handleIncomingMsg (arg=0x1061d9600) at message_server_port.cpp:210
#16 0x0000000100bde041 in boost::_bi::list1<boost::_bi::value<mongo::PortMessageServer::HandleIncomingMsgParam*> >::operator()<void*, void* (*)(void*), boost::_bi::list0> (this=0x1060b9df0, f=@0x1060b9de8, a=@0x1067a7e10, unnamed_arg=0) at bind.hpp:243
#17 0x0000000100bde0a6 in boost::_bi::bind_t<void*, void* (*)(void*), boost::_bi::list1<boost::_bi::value<mongo::PortMessageServer::HandleIncomingMsgParam*> > >::operator() (this=0x1060b9de8) at bind_template.hpp:20
#18 0x0000000100bde0e1 in boost::detail::thread_data<boost::_bi::bind_t<void*, void* (*)(void*), boost::_bi::list1<boost::_bi::value<mongo::PortMessageServer::HandleIncomingMsgParam*> > > >::run (this=0x1060b9c00) at thread.hpp:62
#19 0x0000000100cb6179 in thread_proxy (param=0x1060b9c00) at thread.cpp:121
#20 0x00007fff8bc7a782 in _pthread_start ()
#21 0x00007fff8bc671c1 in thread_start ()


Generated at Thu Feb 08 03:24:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.