[SERVER-56424] improve index build invariant message for system error ENOSPC "28: No space left on device" Created: 28/Apr/21  Updated: 29/Oct/23  Resolved: 27/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.5
Fix Version/s: 5.0.0-rc3, 4.4.8, 5.1.0-rc0

Type: Improvement Priority: Minor - P4
Reporter: Dmitry Agranat Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: Atlas_Failure_Analysis
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-9411 Minimal implementation that can repla... Closed
is related to SERVER-35112 Remove MMAPv1 code Closed
is related to SERVER-8412 repairDatabase: no Cloner, and use mu... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0, v4.4
Sprint: Execution Team 2021-05-31
Participants:

 Description   

Currently we print this error in the log:

{"t":{"$date":"2021-04-23T20:15:36.232+00:00"},"s":"E",  "c":"-",        "id":23077,   "ctx":"IndexBuildsCoordinatorMongod-1214","msg":"Assertion","attr":{"error":"Location16821: error writing to file \"/srv/mongodb/redacted_host_name/_tmp/extsort-index.1227\": errno:28 No space left on device","file":"src/mongo/db/sorter/sorter.cpp","line":1039}}
{"t":{"$date":"2021-04-23T20:15:36.239+00:00"},"s":"I",  "c":"STORAGE",  "id":20649,   "ctx":"IndexBuildsCoordinatorMongod-1214","msg":"Index build failed","attr":{"buildUUID":{"uuid":{"$uuid":"c75a6c95-84ea-4b92-9932-1431e4c4795c"}},"namespace":"redacted_namespace","uuid":{"uuid":{"$uuid":"c37b027a-3026-41ce-ade7-0113bf5dc57d"}},"error":{"code":16821,"codeName":"Location16821","errmsg":"error writing to file \"/srv/mongodb/redacted_host_name/_tmp/extsort-index.1227\": errno:28 No space left on device"}}}
{"t":{"$date":"2021-04-23T20:15:36.239+00:00"},"s":"F",  "c":"-",        "id":23081,   "ctx":"IndexBuildsCoordinatorMongod-1214","msg":"Invariant failure","attr":{"expr":"status.isA<ErrorCategory::Interruption>() || status.isA<ErrorCategory::ShutdownError>()","msg":"Unnexpected error code during index build cleanup: Locat

But if we know that the cause was "No space left on device", can we replace the "Unnexpected error code during index build cleanup: Locat" with "No space left on device"?



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 20/Jul/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-56424 Sorter detects and converts out of disk space system error rather than throwing default unnamed error code

(cherry picked from commit 044ada4e8958efb1c8e045bb5a6e0702bb0686cf)
Branch: v4.4
https://github.com/mongodb/mongo/commit/b16aad93159684f4dd9a1640997cf78338aa8aea

Comment by Githook User [ 20/Jul/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-56424 index build fasserts when system runs out of disk space

(cherry picked from commit 46cd6f9251a1595e640fcdb1788329e520acf695)
Branch: v4.4
https://github.com/mongodb/mongo/commit/c020cc421e71ea47281dc99a8130ad666eaa28f0

Comment by Githook User [ 18/Jun/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-56424 Sorter detects and converts out of disk space system error rather than throwing default unnamed error code

(cherry picked from commit 044ada4e8958efb1c8e045bb5a6e0702bb0686cf)
Branch: v5.0
https://github.com/mongodb/mongo/commit/57f7c1c616905cf4ef79e99553af02a52bfc898a

Comment by Githook User [ 18/Jun/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-56424 index build fasserts when system runs out of disk space

(cherry picked from commit 46cd6f9251a1595e640fcdb1788329e520acf695)
Branch: v5.0
https://github.com/mongodb/mongo/commit/3db3df0dcf3da3cf5d6c5392b00ef5d117d17ddf

Comment by Githook User [ 26/May/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-56424 Sorter detects and converts out of disk space system error rather than throwing default unnamed error code
Branch: master
https://github.com/mongodb/mongo/commit/044ada4e8958efb1c8e045bb5a6e0702bb0686cf

Comment by Githook User [ 26/May/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-56424 index build fasserts when system runs out of disk space
Branch: master
https://github.com/mongodb/mongo/commit/46cd6f9251a1595e640fcdb1788329e520acf695

Comment by Benety Goh [ 25/May/21 ]

The new logs in 4.4 should look something like this if we decide to detect and trigger a fatal assertion on a out of disk space error:

{"t":{"$date":"2021-05-25T06:46:36.987-04:00"},"s":"I",  "c":"STORAGE",  "id":20649,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build failed","attr":{"buildUUID":{"uuid":{"$uuid":"f745174e-6444-4d8d-8dbf-c9ac1f218397"}},"namespace":"test.t","uuid":{"uuid":{"$uuid":"7d66c163-afc1-4eb9-b397-619a791de09e"}},"error":{"code":14031,"codeName":"OutOfDiskSpace","errmsg":"28: No space left on device"}}}
{"t":{"$date":"2021-05-25T06:46:36.987-04:00"},"s":"E",  "c":"STORAGE",  "id":5642401, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build unable to proceed due to insufficient disk space","attr":{"error":{"code":14031,"codeName":"OutOfDiskSpace","errmsg":"28: No space left on device"}}}

Comment by Benety Goh [ 24/May/21 ]

The error code 16821 was added in SERVER-9411.

Comment by Benety Goh [ 24/May/21 ]

We can probably try to map std::errc::no_space_on_device (ENOSPC) (ENOSPC) to ErrorCodes::OutOfDiskSpace.

Comment by Benety Goh [ 24/May/21 ]

The index build error originated from a std::exception in the Sorter as it was persisting the in-memory state. This error would have an error code of 16821.

Inserting into a WiredTiger table used for a MongoDB index may also fail because insufficient disk space errors - in this case the underlying OS error code and shadowed by a generic ErrorCodes::UnknownError.

We actually have an existing named error OutOfDiskSpace that was introduced for repairDatabase in SERVER-8412 but is no longer used in the server after the MMAPv1 storage engine was removed in SERVER-35112.

To improve the error message here in the logs, we could either:

  • do a string search for "28: No space left on device" in the error message, and adjust the invariant message accordingly in the IndexBuildsCoordinator; or
  • have the Sorter return ErrorCodes::OutOfDiskSpace.

Checking for the 28 errno code would probably not work on all platforms (Windows, for example).

Generated at Thu Feb 08 05:39:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.