Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: WT10.0.0, 4.4.0-rc8, 4.7.0
Affects Version/s: None
Component/s: None
Labels:
- SEKB

Sprint:
None
Story Points:
8

alexander.gorrod, chenhao.qu, luke.pearson, alex.cameron, michael.cahill, tammy.bailey:

I was thinking about what we could change to improve debugging format test failures this weekend. I think page-dump debugging gives us pretty good visibility into the current state of the cache at the time of failure, but it doesn't help us out a lot on how we got there.

The format test program has always had rudimentary operation logging, and I'd mostly forgotten it existed until someone recently (I forget who), told me they were using it to chase a particular problem.

The problem with format's logging is it's a dump into stdout via the WiredTiger message handlers (which means it's unnecessarily heavy-weight), lacks both time-stamp and thread-identifying information, and writes output without bound at a pretty fast rate, you couldn't use it to debug long runs.

I wasn't eager to write a high-performance log manager this week, but it turns out that courtesy of sue.loverso, we already have one.

I'm pushing a branch that connects the format test program logging into the WiredTiger log manager for your consideration.

As usual, if this doesn't fit your debugging needs just say so and I'll discard the branch.

The changes:

Change format's logging configuration to create a second WiredTiger database used only to log format operations. I thought we should probably use a separate database so (1) operation logging would work even when the primary database is running an in-memory or other, non-logging, configuration, and (2) there'd be less performance interactions between operation logging and normal operations in the primary. There's an optional configuration that puts the format logging information into the primary's database logs, more about that in a second.

Add a new wiredtiger_open() configuration, "debug_mode.log_retain=##" which configures how many log files we keep. Log files are never archived if we're not doing checkpoints, so the logs would grow without bound. This option means we'll archive (that is, discard), log files and only keep the most recent configured count of the log files.

The format operation logging records are type WT_LOGREC_MESSAGE. To make it easier to use them, I added the "-m" option to the wt utility printlog command so you don't have to wade through the log file header information and normal WiredTiger log file records.

I added a new WT_SESSION method, "WT_SESSION.log_vprintf" so it's easy to push a variadic argument list into the log.

Anyway, with this set of changes, you can pull something like the following out of the log files, with timestamp, thread ID and format operation:

[1590349729:101918][47707:0x7f8d4ffff700] begin snapshot read-ts=297 (not repeatable)
[1590349729:101927][47707:0x7f8d667fc700] remove    126600
[1590349729:101928][47707:0x7f8d66ffd700] read      271456 {0x30}
[1590349729:101930][47707:0x7f8d65ffb700] update    491529 {0000491529.00/opqrs}, {0000491529/LMNOPQRSTUVWXYZABCDEFGHIJ}
[1590349729:101936][47707:0x7f8d4effd700] begin snapshot read-ts=298 (not repeatable)
[1590349729:101943][47707:0x7f8d4ffff700] remove    500736
[1590349729:101954][47707:0x7f8d4effd700] read      70332 {0x30}
[1590349729:101956][47707:0x7f8d64ff9700] commit read-ts=253, commit-ts=299
[1590349729:101961][47707:0x7f8d65ffb700] commit read-ts=295, commit-ts=300
[1590349729:101959][47707:0x7f8d4ffff700] update    610904 {0000610904.00/opqrstuvwxyza}, {0000610904/LMNOPQRSTU}
[1590349729:101991][47707:0x7f8d657fa700] update    295934 {0000295934.00/opqrstuvwx}, {0000295934/LMNOPQRSTUVWXYZABC}
[1590349729:102022][47707:0x7f8d64ff9700] begin snapshot read-ts=301 (not repeatable)
[1590349729:102028][47707:0x7f8d65ffb700] begin snapshot read-ts=302 (not repeatable)
[1590349729:102034][47707:0x7f8d4effd700] remove    346325
[1590349729:102044][47707:0x7f8d4effd700] remove    896930
[1590349729:102057][47707:0x7f8d4effd700] remove    928898
[1590349729:102081][47707:0x7f8d4effd700] commit read-ts=298, commit-ts=303
[1590349729:102099][47707:0x7f8d4effd700] begin snapshot read-ts=304 (not repeatable)

And, of course, you can pull out operations by timestamp, thread ID or key with a little scripting.

If we like this change and want to keep it, I think one other change we might want to make is to optionally redirect WiredTiger verbose messages into the WiredTiger log as well. I almost made that change (the configuratoin option is in the branch already), but the actual change isn't trivial, so I probably won't pursue that until we decide if this work is useful. Anyway, the idea is if you configure format logging to go in to the primary WiredTiger logs, you could also put verbose messages into the log stream as well and get a real-time view of both the operation and what the library code is doing.

Anyway, let me know what you think and if this is worth getting into shape for real use.

related to

WT-6304 add option to redirect WiredTiger verbose messages to the WiredTiger log

Backlog

WT-6309 Add support for start/stop arguments to wt printlog command

Closed

Assignee:: Keith Bostic (Inactive)
Reporter:: Keith Bostic (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: May 24 2020 08:22:18 PM UTC
Updated:: Oct 29 2023 04:43:33 PM UTC
Resolved:: May 27 2020 05:16:46 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates