[SERVER-30809] Investigating remaining writes to the [KV]Catalog that must be timestamped. Created: 24/Aug/17 Updated: 06/Dec/22 Resolved: 20/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Sprint: | Repl 2017-12-18 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
The KVCatalog is a single table used by non-MMAP storage engines that backs all collection and index metadata (along with the "Feature Tracker Document" which need not be considered here). In the KVCatalog, each collection is a document and each index is embedded in its collection document. Currently, writes are only timestamped if there's an associated oplog entry. Creating an index makes two writes to the catalog. The first is written before the index build starts, to insert the index entry into the catalog with a key/value of ready: false that is not replicated. The second write occurs when the index build completes. ready is set to true and an oplog entry is replicated. The first write is not timestamped while the second write is. Consider the following sequence:
The desired state for "foo" is in the initial state before the updates at time 2 and 3. However, rollback_to_stable will restore the document to its "Time 3" state. My understanding is it should be legal for ready: false index writes to have any timestamp >= to the last write on that document in the KVCatalog and < all futures to that document on the KVCatalog. |
| Comments |
| Comment by Daniel Gottlieb (Inactive) [ 20/Dec/17 ] |
|
This ticket resulted in an investigation where an assertion was added to ensure all updates to the [KV]Catalog that may be rolled back rollback have a timestamp. Provisional changes were made to incrementally uncover all code paths evergreen exercises that result in writes to the KVCatalog for primaries and secondaries. The individual work items that investigation surfaced are linked to this ticket. In summary the remaining writes to timestamp are:
|
| Comment by Daniel Gottlieb (Inactive) [ 25/Aug/17 ] |
|
I think readConcern: majority against the oplog should work as expected. |
| Comment by Spencer Brody (Inactive) [ 25/Aug/17 ] |
|
Does this mean it will no longer be possible to do a readConcern:majority read against the oplog and see the state of the oplog as of the replication commit point? |
| Comment by Daniel Gottlieb (Inactive) [ 25/Aug/17 ] |
|
Addendum: Writes to the `_mdb_catalog` on behalf of a collection that's logged (e.g: the oplog or user collections in `local`), unfortunately need to survive a call to `rollback_to_stable`. The conjectured fix: given each (non-feature) document in the `_mdb_catalog` represents a single collection, a namespace should always be available when updating the catalog. The namespace can be passed to `WiredTigerUtil::useTableLogging` (which, for aesthetics, would presumably be pulled up from the WT layer to the KV layer) to determine whether a timestamp should be applied to the write. Writes for a logged table should not be timestamped. |