[SERVER-41064] Log scoped diagnostic data (e.g. in-progress client command) on invariant Created: 09/May/19  Updated: 29/Oct/23  Resolved: 03/Aug/22

Status: Closed
Project: Core Server
Component/s: Diagnostics, Internal Code
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Kevin Pulo Assignee: Billy Donahue
Resolution: Fixed Votes: 3
Labels: sa-groomed
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Service Arch 2022-04-04, Service Arch 2022-04-18, Service Arch 2022-05-02, Service Arch 2022-05-16, Service Arch 2022-05-30, Service Arch 2022-06-13, Service Arch 2022-06-27, Service Arch 2022-07-11, Service Arch 2022-07-25, Service Arch 2022-08-08
Participants:
Case:
Story Points: 4

 Description   

If an invariant is hit while processing an incoming client command, the command being processed should be logged (as it would have been prior to running, if the COMMAND logLevel had been 2 or more).

This is because by default commands are only logged after they have finished running, but this will never happen if an invariant kills the server midway through the command. Diagnostically it's invaluable to know the command (including arguments) that was in progress when the invariant triggered, and it's common to lament the absence of this info when invariants are hit in the field or in production (because it may not be possible to reproduce, and core dumps are unlikely to be available).

Desired change:

 2019-05-08T02:25:10.859+0000 F -        [conn216296] Invariant failure params.descriptor src/mongo/db/query/stage_builder.cpp 102
+2019-05-08T02:25:10.859+0000 F COMMAND  [conn216296] in-progress command: foo.$cmd { find: "bar", filter: { ... }, ... }
 2019-05-08T02:25:10.859+0000 F -        [conn216296]
  
 ***aborting after invariant() failure
 
 2019-05-08T02:25:10.912+0000 F -        [conn216296] Got signal: 6 (Aborted).



 Comments   
Comment by Billy Donahue [ 03/Aug/22 ]

Now we have the `ScopedDebugInfo` utility and the test, but haven't integrated it with any production code yet.
So nothing calls it yet, but if they did, those data would be dumped by `invariant` failures.
So let's now go back and think of some places where this would be useful and add some ScopedDebugInfo decls.

Comment by Billy Donahue [ 03/Aug/22 ]

Mongodb CR bot didn't update the ticket for some reason.
Master branch commit:
https://github.com/mongodb/mongo/commit/7c1422f859117ec9e990e5838300b8c281982f9f

Comment by Billy Donahue [ 14/Mar/22 ]

The idea at the end of this thread seems pretty useful. I have wished I had something like this before.
It would have no impact on existing code, other than invariant itself.

A thread_local std::vector<ScopeDiagnosticInfo>, where ScopeDiagnosticInfo is a TBD lightweight object that can hold some grouping of variant values like <StringData, int64_t, BSONObj*, Status> which together describe a frame, plus a StringData note and a SourceLocation ). It's essentially a little just-in-case plan for a deferred log statement. This can be read by diagnostic systems like invariant to provide extra context in case a call doesn't succeed. This would be like adding a little contextual color to stack traces and could be an essential clue for diagnosing failures.

Mechanically it's not complex. Maybe 4 ServiceArch story points?

Generated at Thu Feb 08 04:56:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.