[SERVER-31566] Sessions with transaction history may become unusable if the oplog rolls over Created: 13/Oct/17  Updated: 30/Oct/23  Resolved: 30/Oct/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.0-rc0
Fix Version/s: 3.6.0-rc2

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2017-11-13
Participants:

 Description   

Sessions, which have ever had retryable write operations run on them (i.e. operations with txnNumber/stmtId) require the complete transaction oplog chain for the most-recently executed transaction to be available so that the executed statements can be loaded and cached.

If the oplog rolls over and part of that transaction's chain is lost for such a session AND the session gets dropped from the cache for any reason (be it step-down, restart, direct write to config.transactions, etc), from that point onward that session will become unusable and all operations using it will start failing with error code IncompleteTransactionHistory (217), regardless of whether they contain retryable writes or not.

NOTE: The problem will go away once the session gets cleaned up as idle.



 Comments   
Comment by Githook User [ 30/Oct/17 ]

Author:

{'email': 'kaloian.manassiev@mongodb.com', 'name': 'Kaloian Manassiev', 'username': 'kaloianm'}

Message: SERVER-31566 Handle truncated oplog at session load time
Branch: master
https://github.com/mongodb/mongo/commit/350ee88b33f32b179b636f33b7db5b0c03932d24

Comment by Githook User [ 30/Oct/17 ]

Author:

{'email': 'kaloian.manassiev@mongodb.com', 'name': 'Kaloian Manassiev', 'username': 'kaloianm'}

Message: SERVER-31566 Pull the session transaction fetch logic into a separate function
Branch: master
https://github.com/mongodb/mongo/commit/23ddb651c92d0310e8eddb12e89116463fa4ca8b

Comment by Kaloian Manassiev [ 27/Oct/17 ]

After further discussion, it was decided that there is merit in being able to opportunistically perform retryability check based on the statements which are still present in the oplog, but if any other entries are missing, only then fail the operation.

Comment by Kaloian Manassiev [ 15/Oct/17 ]

This would stop being a problem of course once the session gets cleaned up, because there won't be an entry for it anymore in config.transactions. So essentially the problem will happen only if the oplog rolls over before the session has been garbage collected as idle.

This indeed seems like an acceptable behaviour and there's really no way to prevent it other than requiring customers to have oplog, which is larger than the session cleanup interval of 30 minutes. If a session ever enters in this state, it can be fixed by just deleting the session, so it gets recreated cleanly.

Based on this I propose that we close it as 'Works as Designed'.

Comment by Kaloian Manassiev [ 15/Oct/17 ]

It is the former - the oplog needs to roll over only for the most-recently executed transaction. It will prevent new transactions from even starting on the session, because we cache the transaction's statements early (when the session is refreshed), which in turn happens at check-out time. So sessions, which drop out of the cache in this state won't be able to be checked-out again.

Comment by Andy Schwerin [ 14/Oct/17 ]

The "complete transaction oplog chain" is a chain of operations for only the most recent transaction on the session, or for all transactions that ever ran on the session? The former seems ok, the latter seems a little hard to work with.

Also, why does this stop new transactions from beginning on the session?

Generated at Thu Feb 08 04:27:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.