[SERVER-50486] invokeWithSessionCheckedOut being called on prepared transactions on secondaries Created: 24/Aug/20  Updated: 29/Oct/23  Resolved: 21/Jan/21

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.6, 4.2.16

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Samyukta Lanka
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File diff    
Issue Links:
Backports
Depends
Related
related to SERVER-59007 Ensure transactions not holding RSTL ... Closed
related to SERVER-59108 Resolve race with transaction operati... Closed
is related to SERVER-66351 Audit uses of OperationContext::setAl... Open
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Sprint: Repl 2020-09-21, Repl 2020-11-02, Repl 2020-11-16, Repl 2020-11-30, Repl 2020-12-14, Repl 2020-12-28, Repl 2021-01-11, Repl 2021-01-25
Participants:
Linked BF Score: 17

 Description   

I think there is a race condition between this preliminary check and the attempt to invokeWithSessionCheckedOut. If a primary node passes the preliminary check but immediately steps down, then it could end up calling invokeWithSessionCheckedOut and refreshFromStorageIfNeeded on a session that may have started a new transaction on the new primary. If the new primary prepares a transaction in the same session and the prepared transaction has replicated to this secondary node that just stepped down, then when the secondary node calls refreshFromStorageIfNeeded, it ends up hitting this MONGO_UNREACHABLE.



 Comments   
Comment by Githook User [ 05/Aug/21 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-50486 Always interrupt multi-document transactions on step down or step up

(cherry picked from commit 5e9d3327d5d08288a932ee77db3be4eb0d45c9c8)
(cherry picked from commit a230371af696ff2eaf17c1937fb0ca62dab476d3)
Branch: v4.2
https://github.com/mongodb/mongo/commit/470efbe1f57c3aaaccb5c27a5bb0c07b2cbcbf13

Comment by Githook User [ 23/Apr/21 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-50486 Always interrupt multi-document transactions on step down or step up

(cherry picked from commit 5e9d3327d5d08288a932ee77db3be4eb0d45c9c8)
Branch: v4.4
https://github.com/mongodb/mongo/commit/a230371af696ff2eaf17c1937fb0ca62dab476d3

Comment by Githook User [ 21/Jan/21 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-50486 Always interrupt multi-document transactions on step down or step up
Branch: master
https://github.com/mongodb/mongo/commit/5e9d3327d5d08288a932ee77db3be4eb0d45c9c8

Comment by Lingzhi Deng [ 10/Nov/20 ]

I forgot that I also tried to reproduce this after Tess a while ago. I had something locally that seemed to reproduce back then and I uploaded it to the attachment. (Hopefully it still reproduces now.)

Comment by Tess Avitabile (Inactive) [ 15/Sep/20 ]

Thanks, this is what I tried, but so far I'm not seeing the config.transactions write happening.

Comment by Lingzhi Deng [ 15/Sep/20 ]

Just an idea: can we just have a failpoint here to hang the prepare application? The config.transactions write should happen on another writer thread and should be able to finish while the prepare application is blocked. But I guess we may need to sleep for a bit to make sure the config.transactions write has actually gone through before resuming the first transaction.

Comment by Tess Avitabile (Inactive) [ 15/Sep/20 ]

Thanks, lingzhi.deng! I'm having a hard time getting the update to config.transactions to happen before the transaction gets prepared on the secondary. I'll keep working on this.

Comment by Lingzhi Deng [ 15/Sep/20 ]

Interesting! I took another look at the BF and the coredump. I think I didn't actually get to the bottom of this when I was investigating the BF. Sorry. After a bit more digging, I think what happened was that the session checkout at invokeWithSessionCheckedOut actually came before the session checkout during prepare application. And that was why the session was still invalid. As the write on transaction table was applied concurrently in the same batch of the prepare transaction, the config.transaction may already have an updated entry for the session with a prepared state. Ideally though, fetchActiveTransactionHistory should read at lastApplied and shouldn't see the entry in the middle of the batch. In fetchActiveTransactionHistory, we do attempt to make the config.transaction lookup read at its own snapshot using a ReadSourceScope. But based on the canSwitchReadSource, we dont always convert secondary reads to use lastApplied. This means if the first transaction has readConcern snapshot, the config.transaction lookup inside fetchActiveTransactionHistory may actually read without a timestamp, ending up seeing the in-flight config.transaction write.

So I think if we start a transaction with readConcern snapshot on a node, hang it before checking out the session. Then step up another node, prepare a second transaction, then wait for the prepare to replicate but hang the prepare application before MongoDOperationContextSessionWithoutRefresh. Finally, resume the first transaction, which should be able to checkout the session with isValid==false, call fetchActiveTransactionHistory, see the newly updated config.transaction entry with a prepared state, and crash.

Comment by Tess Avitabile (Inactive) [ 14/Sep/20 ]

lingzhi.deng, I'm having trouble reproducing this failure, and I could use your help. My repro starts a transaction on a node, then hangs it before checking out the session. It then steps up a new node, prepares a second transaction, then waits for the prepare oplog entry to replicate. Finally, it allows the first transaction to finish, which should trigger the crash.

However, this sequence of events does not cause the session to be invalid on the node, so we skip refreshing from storage here. I'm looking for a way to invalidate the session, but it's tricky, since we refuse to invalidate a session with a prepared transaction.

I think it might be possible to hit the MONGO_UNREACHABLE for an in-progress transaction instead of a prepared transaction, but this involves using transactions larger than 16MB, and I don't think the BF involved larger transactions.

Do you have any advice on reproducing this issue for a prepared transaction? Or do you think it's okay to reproduce it for an in-progress transaction? I think the fix will be general enough to fix prepared transactions, if this bug exists for prepared transactions as well.

Generated at Thu Feb 08 05:22:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.