Diagnosing one-time incidents and ongoing problems in the field often requires asking the customer to reproduce a problem while collecting wide-coverage, high-resolution, long-retention data, for example serverStatus timeseries at 1-second intervals. However this is problematic for a few reasons
- reproducing the problem may be impossible, difficult, or inconvenient for the customer
- the mechanics of collecting diagnostic data with external tools such as mongo shell scripts are often problematic, particularly if the data needs to be collected over an extended period (days or weeks) - servers get restarted, data collection is accidentally terminated, instructions are not followed carefully, it is additional inconvenience for a customer who may have already been severely inconvenienced by a problem in MongoDB, and so on.
Serviceability would improved by a built-in facility to capture such data, compress it, and store it, while using minimal resources such that the facility can be left on full-time in production.