I was thinking about what we could change to improve debugging format test failures this weekend. I think page-dump debugging gives us pretty good visibility into the current state of the cache at the time of failure, but it doesn't help us out a lot on how we got there.
The format test program has always had rudimentary operation logging, and I'd mostly forgotten it existed until someone recently (I forget who), told me they were using it to chase a particular problem.
The problem with format's logging is it's a dump into stdout via the WiredTiger message handlers (which means it's unnecessarily heavy-weight), lacks both time-stamp and thread-identifying information, and writes output without bound at a pretty fast rate, you couldn't use it to debug long runs.
I wasn't eager to write a high-performance log manager this week, but it turns out that courtesy of Susan LoVerso, we already have one.
I'm pushing a branch that connects the format test program logging into the WiredTiger log manager for your consideration.
As usual, if this doesn't fit your debugging needs just say so and I'll discard the branch.
- Change format's logging configuration to create a second WiredTiger database used only to log format operations. I thought we should probably use a separate database so (1) operation logging would work even when the primary database is running an in-memory or other, non-logging, configuration, and (2) there'd be less performance interactions between operation logging and normal operations in the primary. There's an optional configuration that puts the format logging information into the primary's database logs, more about that in a second.
- Add a new wiredtiger_open() configuration, "debug_mode.log_retain=##" which configures how many log files we keep. Log files are never archived if we're not doing checkpoints, so the logs would grow without bound. This option means we'll archive (that is, discard), log files and only keep the most recent configured count of the log files.
- The format operation logging records are type WT_LOGREC_MESSAGE. To make it easier to use them, I added the "-m" option to the wt utility printlog command so you don't have to wade through the log file header information and normal WiredTiger log file records.
- I added a new WT_SESSION method, "WT_SESSION.log_vprintf" so it's easy to push a variadic argument list into the log.
Anyway, with this set of changes, you can pull something like the following out of the log files, with timestamp, thread ID and format operation:
And, of course, you can pull out operations by timestamp, thread ID or key with a little scripting.
If we like this change and want to keep it, I think one other change we might want to make is to optionally redirect WiredTiger verbose messages into the WiredTiger log as well. I almost made that change (the configuratoin option is in the branch already), but the actual change isn't trivial, so I probably won't pursue that until we decide if this work is useful. Anyway, the idea is if you configure format logging to go in to the primary WiredTiger logs, you could also put verbose messages into the log stream as well and get a real-time view of both the operation and what the library code is doing.
Anyway, let me know what you think and if this is worth getting into shape for real use.