[SERVER-39199] Committing or aborting a prepared transaction may not un-pin stable timestamp due to oplog hole Created: 25/Jan/19  Updated: 29/Oct/23  Resolved: 04/Feb/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.8

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: prepare_durability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-38302 Committing or aborting prepared trans... Closed
is related to SERVER-43978 Stable timestamp is not being recalcu... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

This can be reproduced by making this diff on bbe09aa5e0966ada5232ffbcb098efdd78b4f24e:

diff --git a/src/mongo/db/transaction_participant.cpp b/src/mongo/db/transaction_participant.cpp
index 2dfe999..a600f88 100644
--- a/src/mongo/db/transaction_participant.cpp
+++ b/src/mongo/db/transaction_participant.cpp
@@ -1113,6 +1113,7 @@ void TransactionParticipant::commitPreparedTransaction(OperationContext* opCtx,
         _finishOpTime = repl::ReplClientInfo::forClient(opCtx->getClient()).getLastOp();
 
         _finishCommitTransaction(lk, opCtx);
+        sleepmillis(5000);
     } catch (...) {
         // It is illegal for committing a prepared transaction to fail for any reason, other than an
         // invalid command, so we crash instead.

And running:

-> % buildscripts/resmoke.py --suites=core_txns jstests/core/txns/timestamped_reads_wait_for_prepare_oplog_visibility.js 

Sprint: Repl 2019-02-11
Participants:
Linked BF Score: 7

 Description   

If we recalculate the stable timestamp after we commit the transaction, write the commit oplog entry, and update the metrics, but before we close the oplog hole, then we may never recalculate it after the oplog hole actually closes.

This does not go away when we stop pinning the stable timestamp back. This is due to the OplogSlotReserver holding open an oplog hole past where we recalculate the stable timestamp.



 Comments   
Comment by Githook User [ 04/Feb/19 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-39199 recalculate stable timestamp after releasing transaction oplog holes
Branch: master
https://github.com/mongodb/mongo/commit/82f36212ab8be35afebc66b92605071d9e4bbc5c

Comment by Judah Schvimer [ 25/Jan/19 ]

We also are not currently telling storage about the new stable timestamp when we recalculate it, which means we do not advance the majority committed snapshot.

Generated at Thu Feb 08 04:51:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.