[SERVER-15111] partially written journal last section causes recovery to fail Created: 02/Sep/14  Updated: 11/Jul/16  Resolved: 17/Sep/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 2.4.12, 2.6.5, 2.7.7

Type: Bug Priority: Critical - P2
Reporter: Bruce Lucas (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-15663 Missing newlines in journal corruptio... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of Sep 18, 2014

ISSUE SUMMARY
On occasion, the last journal section may be partially written in the case of a crash or a filesystem snapshot backup. This condition is expected and should be tolerated during recovery on a subsequent mongod startup, as the partially written last journal section can be safely ignored. However when this situation occurs it is not ignored, causing recovery to fail with the following log entries:

Assertion: 15874:couldn't uncompress journal section

USER IMPACT
When journal recovery encounters this situation, mongod refuses to start.

WORKAROUNDS
Affected users can run a 2.6.5 or 2.4.12 mongod to recover from this situation. This issue affects recovery only – database files after crashes and filesystem snapshot backups made under prior versons of mongod are healthy and uncorrupted, and are trivially recoverable by 2.6.5 and 2.4.12 mongod.

AFFECTED VERSIONS
MongoDB production releases up to 2.6.4 and 2.4.11 are affected by this issue.

FIX VERSION
The fix is included in the 2.6.5 and 2.4.12 production releases.

RESOLUTION DETAILS
Do not treat an incomplete last section of the journal as an error.

Original description

The last journal section may be partially written in the case of a crash or a filesystem snapshot backup. This condition is expected and should be tolerated during recovery on subsequent mongod startup, as the partially written last journal section can be safely ignored. However when it occurs it causes recovery to fail with the following log entries:

Thu Aug 28 19:27:53.590 [initandlisten] couldn't uncompress journal section
Thu Aug 28 19:27:53.590 [initandlisten] Assertion: 15874:couldn't uncompress journal section
0xde8c31 0xdaa3fb 0x936021 0x936674 0x9377ca 0x937c02 0x93852c 0x938792 0x923a2f 0x6d79cc 0x6d81fd 0x6def10 0x6e0cb9 0x7fa1261f176d 0x6cf789 
 mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde8c31]
 mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xdaa3fb]
 mongod(_ZN5mongo3dur11RecoveryJob14processSectionEPKNS0_11JSectHeaderEPKvjPKNS0_11JSectFooterE+0x561) [0x936021]
 mongod(_ZN5mongo3dur11RecoveryJob17processFileBufferEPKvj+0x134) [0x936674]
 mongod(_ZN5mongo3dur11RecoveryJob11processFileEN5boost11filesystem34pathE+0xda) [0x9377ca]
 mongod(_ZN5mongo3dur11RecoveryJob2goERSt6vectorIN5boost11filesystem34pathESaIS5_EE+0x122) [0x937c02]
 mongod(_ZN5mongo3dur8_recoverEv+0x1dc) [0x93852c]
 mongod(_ZN5mongo3dur7recoverEv+0x22) [0x938792]
 mongod(_ZN5mongo3dur7startupEv+0x7f) [0x923a2f]
 mongod(_ZN5mongo14_initAndListenEi+0x3ec) [0x6d79cc]
 mongod(_ZN5mongo13initAndListenEi+0x1d) [0x6d81fd]
 mongod() [0x6def10]
 mongod(main+0x9) [0x6e0cb9]
 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa1261f176d]
 mongod() [0x6cf789]
Thu Aug 28 19:27:53.637 [initandlisten] dbexception during recovery: 15874 couldn't uncompress journal section
Thu Aug 28 19:27:53.638 [initandlisten] exception in initAndListen: 15874 couldn't uncompress journal section, terminating



 Comments   
Comment by Githook User [ 06/Oct/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'redbeard0531@gmail.com'}

Message: SERVER-15111 Treat corruption of final journal section as an expected event

Manual backport of the following commits (combined):
8e1f5beabfad09c790e46826e8b3c7dcc5070d8d
6e93b33179e71abce820e534b3d32f1e593f71ca
Branch: v2.4
https://github.com/mongodb/mongo/commit/e96f29859aa6bf9baa5e599f9b6d2f611fe031bd

Comment by Githook User [ 17/Sep/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'redbeard0531@gmail.com'}

Message: SERVER-15111 Treat corruption of final journal section as an expected event

Manual backport of the following commits (combined):
8e1f5beabfad09c790e46826e8b3c7dcc5070d8d
6e93b33179e71abce820e534b3d32f1e593f71ca
Branch: v2.6
https://github.com/mongodb/mongo/commit/7bca29e784b536e90387974bfa5a451ce15161a5

Comment by Githook User [ 12/Sep/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-15111 use boost 1.49 compatible copy_file
Branch: master
https://github.com/mongodb/mongo/commit/6e93b33179e71abce820e534b3d32f1e593f71ca

Comment by Githook User [ 12/Sep/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-15111 Treat corruption of final journal section as an expected event
Branch: master
https://github.com/mongodb/mongo/commit/8e1f5beabfad09c790e46826e8b3c7dcc5070d8d

Comment by Githook User [ 10/Sep/14 ]

Author:

{u'username': u'deafgoat', u'name': u'Wisdom Omuya', u'email': u'deafgoat@gmail.com'}

Message: Revert "SERVER-15111 Treat corruption of final journal section as an expected event"

This reverts commit 77b11e686c873ea44e14e6f7419adb24d5f40106.
Branch: master
https://github.com/mongodb/mongo/commit/353dcccd714fdb3ce872853b0600a0b667267ae6

Comment by Githook User [ 09/Sep/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-15111 Treat corruption of final journal section as an expected event

This was always the intention of the code, as a crash may happen while
writing the latest journal section. The code wasn't adjusted when we started
preallocating and reusing journal files and when we started compressing
journal sections, so it ended up treating many common types of corruption as
errors.
Branch: master
https://github.com/mongodb/mongo/commit/77b11e686c873ea44e14e6f7419adb24d5f40106

Comment by Dwight Merriman [ 02/Sep/14 ]

well that's surprising. I guess where it says

catch( BufReader::eof )

it just needs to catch more things, with some thoughtfulness about exactly what.

Generated at Thu Feb 08 03:36:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.