[SERVER-28953] Capture df (disk full) statistics in FTDC Created: 25/Apr/17  Updated: 30/Oct/23  Resolved: 09/Nov/21

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.6, 4.4.11, 4.2.18

Type: Improvement Priority: Major - P3
Reporter: Henrik Ingo (Inactive) Assignee: Sergey Galtsev (Inactive)
Resolution: Fixed Votes: 5
Labels: SWDI, move-sec, platforms-re-triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-14921 Investigate changes in SERVER-28953: ... Closed
Duplicate
is duplicated by SERVER-48710 FTDC disk space info collection Closed
Problem/Incident
causes SERVER-61357 Random failures in verifyGetDiagnosti... Closed
Related
related to SERVER-59615 Store constituent devices in FTDC met... Open
related to SERVER-21818 Capture system metrics in FTDC Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0, v4.4, v4.2, v4.0
Sprint: Security 2021-11-01, Security 2021-11-15
Participants:
Case:
Linked BF Score: 0

 Description   

As we now collect also system metrics in the diagnostic data, it would be sometimes useful to know how full a disk was (or, in particular, whether it was 100% full).

Edit (JamesB): this would be most useful as <bytes available> / <bytes total>, not a percentage



 Comments   
Comment by Githook User [ 01/Dec/21 ]

Author:

{'name': 'sergey.galtsev', 'email': 'sergey.galtsev@mongodb.com', 'username': 'brushless-glitch'}

Message: SERVER-28953 Capture mount statistics in FTDC
Branch: v4.4
https://github.com/mongodb/mongo/commit/576bd6e7a64161d681e7327b70b0ba8340c61513

Comment by Githook User [ 01/Dec/21 ]

Author:

{'name': 'sergey.galtsev', 'email': 'sergey.galtsev@mongodb.com', 'username': 'brushless-glitch'}

Message: SERVER-28953 Capture mount statistics in FTDC
Branch: v4.2
https://github.com/mongodb/mongo/commit/63a1e9a49bc4e9bb447a2ad2802022c5d3c17227

Comment by Githook User [ 01/Dec/21 ]

Author:

{'name': 'sergey.galtsev', 'email': 'sergey.galtsev@mongodb.com', 'username': 'brushless-glitch'}

Message: SERVER-28953 Capture mount statistics in FTDC
Branch: v5.0
https://github.com/mongodb/mongo/commit/0c31c171b15e9212336a9d8e4c373dfc920229be

Comment by Githook User [ 08/Nov/21 ]

Author:

{'name': 'sergey.galtsev', 'email': 'sergey.galtsev@mongodb.com', 'username': 'brushless-glitch'}

Message: SERVER-28953 Capture mount statistics in FTDC
Branch: master
https://github.com/mongodb/mongo/commit/5f04013763ead1af8e3652c195453f26e4df77bc

Comment by Alex Bevilacqua [ 22/Oct/21 ]

sergey.galtsev I see that as well. My apologies, I assumed OSX was capturing the same telemetry as Linux. in this case I guess Linux for sure, Windows if possible.

Comment by Sergey Galtsev (Inactive) [ 21/Oct/21 ]

MongoDB Enterprise > use admin
switched to db admin
MongoDB Enterprise > db.runCommand("getDiagnosticData").data.systemMetrics
MongoDB Enterprise > db.runCommand("getDiagnosticData").data.systemMetrics == null
true

Comment by Sergey Galtsev (Inactive) [ 21/Oct/21 ]

alex.bevilacqua can you please confirm that OSX requirement? I did a quick verification, and I don't see that OSX version currently collects disk information at all.

Comment by Alex Bevilacqua [ 21/Oct/21 ]

sergey.galtsev this should be available on all platforms if possible, but OSX/Linux for sure.

Comment by Sergey Galtsev (Inactive) [ 21/Oct/21 ]

renato.riccio can you clarify whether this feature should be limited to Linux, available on all platforms, or up to discretion?

Comment by Bruce Lucas (Inactive) [ 20/Jul/17 ]

this would be most useful as <bytes available> / <bytes total>, not a percentage

I would suggest capturing both bytes used (not bytes available) and bytes total. That way we can see whether percent usage changed because disk usage changed or because disk capacity changed. Bytes used is better than bytes available because it won't change when capacity changes, whereas bytes available will.

Comment by Henrik Ingo (Inactive) [ 02/May/17 ]

Expanding on Bruce's list:

  • Usually when we do experience a disk full issue in performance testing, there will be a trace of that in the mongod log file.
  • However, it's always possible that mongod somehow didn't handle the disk full situation correctly. Because of this, whenever I see any other assertion from mmapv1 or wiredTiger engines, I'm always suspicous whether the underlying cause was a disk full after all. It would be good to have a second source to check the disk full stats for such cases.
  • Even when it turns out the disk was NOT full (and therefore, the logging was correct), I still need a way to verify that this was the case.
  • Finally, it's possible that writing to the log itself fails, as the disk is full (if it's on the same partition with the data files). In this case it would be useful to see that seconds earlier the disk was 99% full.
Comment by Bruce Lucas (Inactive) [ 28/Apr/17 ]

From a diagnostic perspective, normally you would expect to be able to learn that a failure was caused by running out of disk space from mongod logs. However I can think of two circumstances where having this information in FTDC as well would be useful:

  • if you're diagnosing a problem from FTDC alone and don't have logs. This is not normally the case, but it happens more than you might think - logs can contain sensitive information, they can be large, the customer can be delayed in uploading them, they can lose or delete them, the customer can upload the wrong logs, etc.
  • if you're diagnosing a bug in mongod handling of out of file space.
Comment by Henrik Ingo (Inactive) [ 26/Apr/17 ]

The concrete issue I've linked is weird things happening, that may or may not have been caused because a disk was full. (Note that we can of course run df ourselves when testing, but I wanted to propose this from a completeness point of view. Since ftdc data obsolotes the need to run iostat, why not df too?)

I guess from a support point of view, additional value could be provided if we could tell the customer: "By the way, your disk is 95% full and will be completely full in a month from now."

Comment by Mark Benvenuto [ 25/Apr/17 ]

Are you worried about the effects of file allocation getting slower as disk fills up? Or just it hitting 100%?

Linux: statfs (2)
Windows: Performance Counters

Cc: bruce.lucas

Generated at Thu Feb 08 04:19:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.