[SERVER-59615] Store constituent devices in FTDC metadata Created: 26/Aug/21  Updated: 22/Jan/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kevin Arhelger Assignee: Brad Moore
Resolution: Unresolved Votes: 3
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File fs.py    
Issue Links:
Related
is related to SERVER-28953 Capture df (disk full) statistics in ... Closed
is related to SERVER-59633 Capture device mapper disk statistics... Closed
Assigned Teams:
Server Security
Backwards Compatibility: Fully Compatible
Sprint: Security 2023-10-02, Security 2023-10-16, Security 2023-10-30, Security 2023-11-13, Security 2023-12-25, Security 2024-01-08, Security 2024-02-19
Participants:

 Description   

Currently, FTDC metadata stores information about filesystem mounts.
This can be used to determine what devices are used for accessing MongoDB data in the simple case, but can't be used to determine what subset of disks are utilized on systems with software raid or LVM.

Being able to filter what disks contribute to mongod would be incredibly useful, especially on machines with dozens of disks. Storing something similar to lsblk output would greatly help in these scenarios.



 Comments   
Comment by Kevin Arhelger [ 31/Aug/21 ]

Hello Mark,

Thanks for the feedback.

1. The script should enumerate all devices for a single filesystem. I only really care about the devices making up any filesystems used by the mongo* process (logpath, dbpath, auditDestination, diagnosticDataCollectionDirectoryPath, etc). If it's not an issue to list the potentially dozens of filesystems underlying device(s) I see no reason not to include it, but minimized the script output in case this was a concern.
2. That would be great.
3. No issues here.
4. The proposed device info seems reasonable to me. This should be enough information to determine what devices are related to the dbpath.
5. I could see sizes being useful, but I don't think the utility is as great as which devices are used for storing data. I would put it in the category of nice to have vs. must have. I'm not sure what else would be useful to gather at the same time. Things like raid / lvm configuration is probably significantly complicated and of too little utility to collect and store. We have other, slower ways to determine that information, but understanding which of the dozens of devices are relevant in an investigation would be very helpful.

Comment by Mark Benvenuto [ 31/Aug/21 ]

kevin.arhelger, your sample script is very helpful. Some clarifying questions

1. Your script enumerates just one device for a file system, do you want FTDC to include all the device information?
2. Since this information does not change frequently, we will include this information in the same document as hostInfo and other "static" metadata about the system. I.e. FTDC will collect it only on startup and file rotation.
3. We could just include this information in hostInfo like the mount table
4. Assuming we extend mountInfo, would adding something like {{devices}] array be what you had in mind?

Example:

{
  "mountId": {
    "$numberInt": "135"
  },
  "parentId": {
    "$numberInt": "20"
  },
  "major": {
    "$numberInt": "253"
  },
  "minor": {
    "$numberInt": "0"
  },
  "root": "/",
  "mountPoint": "/mnt/vg",
  "options": "rw,relatime",
  "fields": "shared:119",
  "type": "ext3",
  "source": "/dev/mapper/test_vg-vg0",
  "superOpt": "rw,stripe=256,data=ordered"
  "devices": [
     { "name": "loop0",
        "major": 7,
        "minor: "0"
     },
     { "name": "loop1",
        "major": 7,
        "minor: "1"
     }
  ]
}

Would you want more information then that like the sizes of the constituent devices?

Comment by Bruce Lucas (Inactive) [ 27/Aug/21 ]

Thanks for clarifying the request is for additional metadata (not additional metrics). I'll pass this on to the appropriate team.

Comment by Kevin Arhelger [ 26/Aug/21 ]

Thanks for the comments Bruce.

This suggestion is just for changes to FTDC metadata. The main pain point is systems with many RAID or LVM volumes. An Ops Manager Head DB is one example where there could be dozens of raid disks, but only one or two of the physical disks are backing the dbpath. The end goal would allow tools to automatically highlight which devices apply to the monitored process.

Today: bsondump metrics.2021-08-26T21-08-58Z-00000 2> /dev/null | head -1 | jq '.doc.hostInfo.extra.mountInfo[-1]'

{
  "mountId": {
    "$numberInt": "135"
  },
  "parentId": {
    "$numberInt": "20"
  },
  "major": {
    "$numberInt": "253"
  },
  "minor": {
    "$numberInt": "0"
  },
  "root": "/",
  "mountPoint": "/mnt/vg",
  "options": "rw,relatime",
  "fields": "shared:119",
  "type": "ext3",
  "source": "/dev/mapper/test_vg-vg0",
  "superOpt": "rw,stripe=256,data=ordered"
}

I have no way of knowing what /dev/mapper/test_vg-vg0 is (in this case its /dev/loop0 and /dev/loop1).
If I run lsblk

NAME          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0           7:0    0   256M  0 loop
└─test_vg-vg0 253:0    0   504M  0 lvm  /mnt/vg
loop1           7:1    0   256M  0 loop
└─test_vg-vg0 253:0    0   504M  0 lvm  /mnt/vg

There are a few different options that would all work:
1. Simply store a new field in FTDC metadata that contains output similar to lsblk. A clever consumer of FTDC could parse the metadata and determine what devices are interesting to look at.
2. Add an array to the mountInfo sub documents containing the list of devices.
3. Add a new subdocument that contains the dbpath mappings.

Comment by Bruce Lucas (Inactive) [ 26/Aug/21 ]

Can you flesh this proposal out a bit perhaps by a simple example showing what's in ftdc today and what would be in ftdc for the same system after this is implemented? Also, can you clarify whether your talking about disk metrics or metadata (since you mention metadata in the opening comment I'm a little uncertain.)

Generated at Thu Feb 08 05:47:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.