[SERVER-13739] Repair database failure can delete database files Created: 25/Apr/14  Updated: 11/Jul/16  Resolved: 30/Apr/14

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.6.0, 2.6.1-rc0
Fix Version/s: 2.6.1, 2.7.0

Type: Bug Priority: Critical - P2
Reporter: Chad Kreimendahl Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of April 30, 2014

ISSUE SUMMARY
If the repairDatabase command cannot create a directory during the repair process, it can abort the repair and delete the temporary files, which may in certain cases be the only copy of the data. This is a regression from 2.4 where the repair would leave the partially repaired data in a temporary directory.

USER IMPACT
Users affected by this issue can potentially lose the data they are trying to repair.

WORKAROUNDS
It is highly recommended to make a backup copy before running a repair. For replica set nodes, a node can be fully resynced from another member instead of repairing it. Ensuring correct file permissions for the mongodb user will also avoid the issue.

RESOLUTION
The removal of database files now happens after the directory creation, leaving the files in the temporary directory intact on failure. This restores the previous behavior of version 2.4.x.

AFFECTED VERSIONS
Version 2.6.0 is affected by this issue.

PATCHES
The patch is included in the 2.6.1 production release.

Original description

When performing a database repair, if a failure occurs while trying to move the temporary files, all of the data can be deleted.

2014-04-24T17:28:02.466-0500 [FileAllocator] done allocating datafile M:\db\_tmp_repairDatabase_0\Dev\Dev.7, size: 2047MB,  took 0.001 secs
2014-04-24T17:28:21.866-0500 [conn86] removeJournalFiles
2014-04-24T17:28:25.840-0500 [conn86] Assertion: 13294:caught exception: boost::filesystem::create_directory: Access is denied: "M:\db\Dev" src\mongo\db\repair_database.cpp 443
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\util\stacktrace.cpp(169)                                      mongo::printStackTrace+0x43
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\util\log.cpp(122)                                             mongo::logContext+0x9c
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\util\assert_util.cpp(183)                                     mongo::msgasserted+0xfb
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\util\assert_util.cpp(174)                                     mongo::msgasserted+0x13
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\repair_database.cpp(443)                                   `mongo::repairDatabase'::`1'::catch$2+0xa3
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  f:\dd\vctools\crt_bld\SELF_64_amd64\crt\prebuild\eh\amd64\handlers.asm(44)  _CallSettingFrame+0x20
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  f:\dd\vctools\crt_bld\self_64_amd64\crt\prebuild\eh\frame.cpp(1337)         __CxxCallCatchBlock+0xeb
2014-04-24T17:28:31.364-0500 [conn86] ntdll.dll                                                                               RtlCaptureContext+0x3c3
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\repair_database.cpp(443)                                   mongo::repairDatabase+0x1d6b
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\dbcommands.cpp(280)                                        mongo::CmdRepairDatabase::run+0x25b
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\dbcommands.cpp(1357)                                       mongo::_execCommand+0x5e
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\dbcommands.cpp(1575)                                       mongo::Command::execCommand+0xf41
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\dbcommands.cpp(1650)                                       mongo::_runCommands+0x4a7
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\query\new_find.cpp(107)                                    mongo::runCommands+0x41
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\query\new_find.cpp(384)                                    mongo::newRunQuery+0x49d
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\instance.cpp(269)                                          mongo::receivedQuery+0x44f
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\instance.cpp(434)                                          mongo::assembleResponse+0x30b
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\db\db.cpp(202)                                                mongo::MyMessageHandler::process+0x111
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\mongo\util\net\message_server_port.cpp(210)                         mongo::PortMessageServer::handleIncomingMsg+0x677
2014-04-24T17:28:31.364-0500 [conn86] mongod.exe  ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)             boost::`anonymous namespace'::thread_start_function+0x21
2014-04-24T17:28:31.364-0500 [conn86] 
2014-04-24T17:28:31.368-0500 [conn86] cleaning up failed repair db: Dev path: M:\db\_tmp_repairDatabase_0



 Comments   
Comment by Githook User [ 30/Apr/14 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-13739: don't delete new files until as late as possible during repair
Branch: v2.6
https://github.com/mongodb/mongo/commit/2516d36768448ee1b6a2246eb562f48e2f9a0751

Comment by Githook User [ 30/Apr/14 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-13739: don't delete new files until as late as possible during repair
Branch: master
https://github.com/mongodb/mongo/commit/ed1c2d2db431766492892e68702564f5722f15b0

Comment by Chad Kreimendahl [ 25/Apr/14 ]

I may have forgot to add that it deleted both the temp database (cleanup seen at the end), but the failed write somehow wiped the entire directory. We do directoryPerDB, and the directory in question was deleted.

It also appears that some slashes are used as cancellation characters in what I posted, so for full paths, look at the source of the original note.

Generated at Thu Feb 08 03:32:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.