Type: New Feature
Priority: Major - P3
Resolution: Won't Fix
Affects Version/s: WT2.6.1
Fix Version/s: None
WAL is a powerful instrument that can be leveraged for a number of different purposes currently outside of scope for WT:
1. Facilitate replication
Log records cab be transported to a different replica and replayed there. While MongoDB achieves this with its oplog, there is a significant performance penalty associated with writing another copy of the data and synchronizing log streams concurrently. WT has a world-class logging implementation with multiple performance optimizations already in place to achieve high log throughput under high concurrency. Therefore, reusing WT log for replication just makes sense. Perhaps MongoDB has other reasons to keep its oplog separately from the engine's log, but another DBMSes may be built on top of WT in the future.
One suggestion about whether to expose the log record format to the user or not. Both "yes" and "no" are viable strategies with their own strengths. If you decide to document the format you may enable more creative use cases down the road, but are required to stick with this format and version in in very effective manner. But if you chose to keep the format opaque you still can provide and API to replay the log records onto a database. This is a more prescriptive method but it the one that's easier to maintain.
2. Allow application to associate a custom context with transactions and have this context be recorded as a part of the log record.
This capability is needed for instance in cases where a user transaction is already a part of some global transaction sequence managed outside of WT. The user can leverage this to implement group commit protocols on top of WT.
For inspiration, see a similar feature in RocksDB: PutLogData in its WriteBatch API: [PutLogData](https://github.com/facebook/rocksdb/blob/master/include/rocksdb/write_batch.h#L114)
For an additional context, see my exchange with Michael below.
On Sun, Sep 20, 2015 at 7:23 PM, Michael Cahill <firstname.lastname@example.org> wrote:
> I'm wondering what are the possible use cases that would involve scanning of the log in WT?
WiredTiger uses it for recovery and for the “wt printlog” command.
> The obvious one I'm thinking about is to facilitate replication. But in order to use the log for replication one needs to be able to replay the log against the target DB. I can't find any examples of this or references to this possibility in general.
We don’t publish the format of log records that WiredTiger generates, so the only place where you can see how to play back the log is in WiredTiger’s recovery code. We haven’t designed logging with the goal of supporting replication, but if you transported log records to replicas they could certainly be played back to sync up with a primary.
> Another similar question is - why would one want to insert a custom piece of data into already recorded log record - the feature supported by WT?
The WT_SESSION::log_printf method is primarily intended for debugging: since our log includes support for lightweight in-memory buffering, it is sometimes more efficient to write debug messages into the transaction log. They are also interleaved with regular operations, which can make it easier to piece together what happened during recovery.
> And yet another question about the logging API. Would could be very valuable is for customer to associate a custom piece of data with the operation before the record is produced and have this piece of data appear in the log record during the log scan. For instance, say I have one of distributed group commit protocols in front of WT, and need to save my protocol state durably and correlate my historical protocol state with WT records in order to perform recovery, etc.
> Has this capability been considered?
We’re happy to engage to help augment the logging API (e.g., to attach information to a specific commit) — if you have specific requirements, let us know and we can open a JIRA ticket to nail down an API and track progress.