[SERVER-31566] Sessions with transaction history may become unusable if the oplog rolls over Created: 13/Oct/17 Updated: 30/Oct/23 Resolved: 30/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.0-rc0 |
| Fix Version/s: | 3.6.0-rc2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Sprint: | Sharding 2017-11-13 |
| Participants: |
| Description |
|
Sessions, which have ever had retryable write operations run on them (i.e. operations with txnNumber/stmtId) require the complete transaction oplog chain for the most-recently executed transaction to be available so that the executed statements can be loaded and cached. If the oplog rolls over and part of that transaction's chain is lost for such a session AND the session gets dropped from the cache for any reason (be it step-down, restart, direct write to config.transactions, etc), from that point onward that session will become unusable and all operations using it will start failing with error code IncompleteTransactionHistory (217), regardless of whether they contain retryable writes or not. NOTE: The problem will go away once the session gets cleaned up as idle. |
| Comments |
| Comment by Githook User [ 30/Oct/17 ] |
|
Author: {'email': 'kaloian.manassiev@mongodb.com', 'name': 'Kaloian Manassiev', 'username': 'kaloianm'}Message: |
| Comment by Githook User [ 30/Oct/17 ] |
|
Author: {'email': 'kaloian.manassiev@mongodb.com', 'name': 'Kaloian Manassiev', 'username': 'kaloianm'}Message: |
| Comment by Kaloian Manassiev [ 27/Oct/17 ] |
|
After further discussion, it was decided that there is merit in being able to opportunistically perform retryability check based on the statements which are still present in the oplog, but if any other entries are missing, only then fail the operation. |
| Comment by Kaloian Manassiev [ 15/Oct/17 ] |
|
This would stop being a problem of course once the session gets cleaned up, because there won't be an entry for it anymore in config.transactions. So essentially the problem will happen only if the oplog rolls over before the session has been garbage collected as idle. This indeed seems like an acceptable behaviour and there's really no way to prevent it other than requiring customers to have oplog, which is larger than the session cleanup interval of 30 minutes. If a session ever enters in this state, it can be fixed by just deleting the session, so it gets recreated cleanly. Based on this I propose that we close it as 'Works as Designed'. |
| Comment by Kaloian Manassiev [ 15/Oct/17 ] |
|
It is the former - the oplog needs to roll over only for the most-recently executed transaction. It will prevent new transactions from even starting on the session, because we cache the transaction's statements early (when the session is refreshed), which in turn happens at check-out time. So sessions, which drop out of the cache in this state won't be able to be checked-out again. |
| Comment by Andy Schwerin [ 14/Oct/17 ] |
|
The "complete transaction oplog chain" is a chain of operations for only the most recent transaction on the session, or for all transactions that ever ran on the session? The former seems ok, the latter seems a little hard to work with. Also, why does this stop new transactions from beginning on the session? |