[SERVER-16152] WiredTiger out of disk space results in UnknownError Created: 14/Nov/14  Updated: 06/Apr/16  Resolved: 28/Jan/15

Status: Closed
Project: Core Server
Component/s: Logging, Storage
Affects Version/s: 2.8.0-rc0
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Minor - P4
Reporter: Andrew Emil (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: 28qa, WTplaybook, wiredtiger
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-16573 Mongod terminates when no space is av... Closed
Related
is related to SERVER-16022 Assertion hit when running out of dis... Closed
is related to SERVER-16131 Log File Blowing up on sharded ycsb r... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: Linux
Participants:

 Description   

When WiredTiger fails due to lack of disk space the message is somewhat obscure:

2014-11-13T14:14:34.864-0800 D STORAGE  [conn2] WiredTigerSizeStorer::storeInto table:collection-99984--6455822767974076256 -> { numRecords: 1, dataSize: 19 }
2014-11-13T14:14:34.864-0800 D STORAGE  [conn2] WiredTigerSizeStorer::storeInto table:collection-99986--6455822767974076256 -> { numRecords: 1, dataSize: 25 }
2014-11-13T14:14:34.864-0800 D STORAGE  [conn2] WiredTigerSizeStorer::storeInto table:collection-99988--6455822767974076256 -> { numRecords: 4, dataSize: 88 }
2014-11-13T14:14:34.864-0800 D STORAGE  [conn2] WiredTigerSizeStorer::storeInto table:collection-99990--6455822767974076256 -> { numRecords: 1, dataSize: 19 }
2014-11-13T14:14:34.864-0800 D STORAGE  [conn2] WiredTigerSizeStorer::storeInto table:collection-99992--6455822767974076256 -> { numRecords: 1, dataSize: 25 }
2014-11-13T14:14:34.864-0800 D STORAGE  [conn2] WiredTigerSizeStorer::storeInto table:collection-99994--6455822767974076256 -> { numRecords: 27, dataSize: 594 }
2014-11-13T14:14:34.864-0800 D STORAGE  [conn2] WiredTigerSizeStorer::storeInto table:collection-99996--6455822767974076256 -> { numRecords: 1, dataSize: 19 }
2014-11-13T14:14:34.864-0800 D STORAGE  [conn2] WiredTigerSizeStorer::storeInto table:collection-99998--6455822767974076256 -> { numRecords: 1, dataSize: 25 }
2014-11-13T14:14:34.870-0800 E STORAGE  [conn2] WiredTiger (-31801) [1415916874:870746][3634:0x7fe4f7bce700], session.commit_transaction: journal/WiredTigerLog.0000000361: posix_fallocate: WT_ERROR: non-specific WiredTiger error
2014-11-13T14:14:34.871-0800 I -        [conn2] Fatal assertion 28519 UnknownError -31801: WT_ERROR: non-specific WiredTiger error
2014-11-13T14:14:34.889-0800 I CONTROL  [conn2] 
 0xfee2b3 0xfa1c2b 0xf892d5 0xe2dda9 0xe2defb 0xdc20c3 0xa08c01 0xa74767 0xa

The cause of the problem only became clear to me after running df -h. If mongod dies because of running out of disk space, we should try to make that very clear to the user.



 Comments   
Comment by Michael Cahill (Inactive) [ 28/Jan/15 ]

WiredTiger now generates a meaningful error message for out-of-disk space.

Comment by Keith Bostic (Inactive) [ 16/Jan/15 ]

michael.cahill I'm missing something, I don't see where fallocate turns ENOSPC into WT_ERROR? (I agree with you that's what's happening, but I'm not seeing where it happens.)

Comment by Michael Cahill (Inactive) [ 14/Jan/15 ]

Or is the issue here just that WiredTiger should log the low-level error message (i.e., strerror on POSIX) rather than mapping ENOSPC to WT_ERROR?

Comment by Michael Cahill (Inactive) [ 13/Jan/15 ]

Should ERROR_HANDLE_DISK_FULL be mapped to a MongoDB exception in mongo::wtRCToStatus_slow (in wiredtiger_util.cpp)? In other words, will that code have to become platform-specific?

Comment by Mark Benvenuto [ 13/Jan/15 ]

The Windows errors are all translated into their own error code range (ie, negative numbers). It will come out as the following error in the logs.

ERROR_HANDLE_DISK_FULL
 
    39 (0x27)
    The disk is full.

Comment by Michael Cahill (Inactive) [ 13/Jan/15 ]

mark.benvenuto and keith.bostic, we can make sure that ENOSPC isn't mapped to WT_ERROR inside WiredTiger on POSIX. What about on Windows: with the current treatment of errors, what should the integration layer do to tell if something failed because a disk is full?

Comment by Eric Milkie [ 14/Nov/14 ]

Presumably, the call to posix_fallocate in os_fallocate.c returned error status ENOSPC; I'm not sure where it got lost in translation.

Generated at Thu Feb 08 03:40:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.