[SERVER-52956] Add storage debug method to dump system-wide RecoveryUnit/transaction state Created: 20/Nov/20  Updated: 06/Dec/22  Resolved: 06/Jan/21

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Backlog - Storage Execution Team
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-45556 Create GDB pretty printer to dump all... Closed
Related
related to SERVER-61177 Create GDB command to dump WiredTiger... Closed
is related to SERVER-52623 Aborting in-progress transactions on ... Blocked
is related to WT-3529 Add undocumented debug API Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

There are a couple ways of going about this.

  • Expose methods in WT that dumps its internal state.
  • Have MDB walk Clients->OpCtx->RecoveryUnits

The latter is advantageous because it's easier to connect an OpCtx with what it was doing than a pointer to a WT session.

However, there are some gaps with walking MDB's tree. It's uncommon, but possible for recovery units to become detached as part of a "side transaction". They're temporarily stored as function locals and otherwise unable to be discovered by traversing memory structures.

If WT_CONNECTION::debug_info returned a structure, we might be able to get the best of both worlds by linking up a WT session with an OpCtx and be able to surface the dangling transactions.



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 06/Jan/21 ]

Hm, that existing GDB function may be sufficient. Apologies for forgetting that existed. If we wanted to be thorough, maybe we could run it against the coredumps from the linked tickets to see if it met our needs, but I don't think performing that is necessary. Closing this as a dup of SERVER-45556.

Comment by Louis Williams [ 05/Jan/21 ]

The mongodb-dump-recovery-units GDB helper already implements Dan's second suggested solution. See SERVER-45556. It walks all Clients and dumps the RecoveryUnit state. As he pointed out, this will not report every active transaction, however.

daniel.gottlieb, is there functionality missing from the existing GDB helper that would have helped diagnose SERVER-52623 that what you've requested with this ticket?

Otherwise, this ticket is dependent on support from a WiredTiger API call. We would add support to dump all active  WiredTiger transactions, not just those in use by MongoDB Clients.

 

Comment by Daniel Gottlieb (Inactive) [ 04/Jan/21 ]

I imagined this would be a function that gdb attached to a running process could call, similar to dumping the lock state. Hopefully in that context, there'd be an opportunity to grab the global exclusive lock.

Comment by Connie Chen [ 04/Jan/21 ]

daniel.gottlieb When would we want to call this/use this? How would we avoid extra synchronization?

Generated at Thu Feb 08 05:29:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.