[SERVER-48552] Attempt to synchronize ghost timestamps Created: 02/Jun/20  Updated: 09/Jul/20  Resolved: 09/Jul/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Eric Milkie
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Execution Team 2020-06-15, Execution Team 2020-07-13
Participants:

 Description   

The goal here is to allow for ghost timestamps in spirit, but better control the timestamp used to avoid out-of-order update chains (e.g: 10->20->100->40).

When a ghost timestamp is required instead of looking at the logical clock, attempt to perform:

  1. Acquire PBWM + RSTL (whichever the right order is)
  2. If primary, write a no-op oplog entry
  3. If secondary, ghost timestamp with last-applied

Claims that need to be verified:

  • All usages of ghost timestamps are capable of acquiring the PBWM
  • last-applied is sufficient for avoiding out-of-order update chains
  • Choosing an earlier value doesn't persist something that's intended to be rolled back. I heard of that behavior being relied upon a while back, but I'm not too familiar and am unsure if the behavior still exists.


 Comments   
Comment by Eric Milkie [ 09/Jul/20 ]

For now this work seems to be unnecessary, as the underlying WiredTiger storage engine has been modified to support the ghost timestamping behavior.

Comment by Eric Milkie [ 01/Jul/20 ]

This work is blocked waiting to see if WiredTiger can fix durable history to handle out-of-order update chains.

Comment by Eric Milkie [ 03/Jun/20 ]

Sorry, you're right. I'm working on the 4.4 branch in my development for this.

Comment by Louis Williams [ 03/Jun/20 ]

milkie this is marked "4.5 Required", but we are actively trying to remove ghost timestamps in this release. I think this should be marked as 4.4-only, right?

Comment by Daniel Gottlieb (Inactive) [ 02/Jun/20 ]

A patch that should error if an out-of-order update chain is observed on WT tables susceptible to be affected:

diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
index 5b307679f9..40087cb04f 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
@@ -730,6 +730,10 @@ StatusWith<std::string> WiredTigerRecordStore::generateCreateString(
         ss << "prefix_compression,";
     }
 
+    if (ns == "_mdb_catalog" || ns == "local.replset.minvalid") {
+        ss << "assert=(durable_timestamp=key_consistent),";
+    }
+
     ss << "block_compressor=" << wiredTigerGlobalOptions.collectionBlockCompressor << ",";
 
     ss << WiredTigerCustomizationHooks::get(getGlobalServiceContext())->getTableCreateConfig(ns);
diff --git a/src/third_party/wiredtiger/src/txn/txn.c b/src/third_party/wiredtiger/src/txn/txn.c
index fd19865636..fa77f6c16e 100644
--- a/src/third_party/wiredtiger/src/txn/txn.c
+++ b/src/third_party/wiredtiger/src/txn/txn.c
@@ -1028,15 +1028,16 @@ err:
 static inline int
 __txn_commit_timestamps_assert(WT_SESSION_IMPL *session)
 {
-    WT_CURSOR *cursor;
+	WT_CURSOR *cursor;
     WT_DECL_RET;
     WT_TXN *txn;
     WT_TXN_OP *op;
     WT_UPDATE *upd;
     wt_timestamp_t durable_op_timestamp, op_timestamp, prev_op_timestamp;
     u_int i;
-    bool op_zero_ts, upd_zero_ts;
+    bool op_zero_ts, upd_zero_ts, has_global_stable;
 
+    has_global_stable = S2C(session)->txn_global.has_stable_timestamp;
     txn = session->txn;
     cursor = NULL;
 
@@ -1064,6 +1065,10 @@ __txn_commit_timestamps_assert(WT_SESSION_IMPL *session)
     if (!F_ISSET(txn, WT_TXN_TS_COMMIT_KEYS | WT_TXN_TS_DURABLE_KEYS))
         return (0);
 
+    if (!has_global_stable) {
+	    return (0);
+    }
+
     /*
      * Error on any valid update structures for the same key that are at a later timestamp or use
      * timestamps inconsistently.
@@ -1120,14 +1125,17 @@ __txn_commit_timestamps_assert(WT_SESSION_IMPL *session)
          * instantiated along with the prepared stop when the page is read into memory or appended
          * by a failed eviction which attempted to write a prepared update to the data store.
          */
+
         op_zero_ts = !F_ISSET(txn, WT_TXN_HAS_TS_COMMIT);
         upd_zero_ts = prev_op_timestamp == WT_TS_NONE;
-        if (op_zero_ts != upd_zero_ts &&
+        if (false && op_zero_ts != upd_zero_ts &&
           !F_ISSET(upd, WT_UPDATE_RESTORED_FROM_HS | WT_UPDATE_RESTORED_FROM_DS)) {
             WT_ERR(__wt_verbose_dump_update(session, upd));
+            WT_ERR(__wt_verbose_dump_txn(session));
             WT_ERR(__wt_verbose_dump_txn_one(session, session, EINVAL,
               "per-key timestamps used inconsistently, dumping relevant information"));
         }
+
         /*
          * If we aren't using timestamps for this transaction then we are done checking. Don't check
          * the timestamp because the one in the transaction is not cleared.

Generated at Thu Feb 08 05:17:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.