[SERVER-39762] Fix fastcount after rollback recovery of prepared transactions. Created: 22/Feb/19 Updated: 29/Oct/23 Resolved: 27/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.10 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Pavithra Vetriselvan | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prepare_durability, txn_storage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Storage NYC 2019-03-25, Storage NYC 2019-04-08 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
When we recover a prepared transaction, we re-apply the oplog entry and replay the operations in the transaction to get back to the prepare state. This line prevents us from adding a NumRecordsChange on the new recovery unit. During rollback, say we commit a prepared transaction on the rollback node. This entry will get rolled back, the transaction will get invalidated (which releases our txnResources, including the recovery unit), and we will replay oplog entries from the stable timestamp. This involves putting the corresponding prepared transaction back into prepare. Since the new recovery unit does not register the operations of the prepared transaction, when we decide to either commit/abort it, there is nothing to add or subtract to numRecords. Previously, we have accounted for counts during rollback in _findRecordStoreCounts, where we calculate the diffs for each UUID. For prepared transactions, however, we are not sure if we will eventually commit or eventually abort the transaction. So, it might not make sense to preemptively change the counts. Ideally, we would figure out a way for the new recovery unit to make an exception for prepared transactions that were recovered during rollback and make sure the changes get recorded when reapplying operations.
|
| Comments |
| Comment by Githook User [ 27/Mar/19 ] |
|
Author: {'name': 'Louis Williams', 'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com'}Message: |
| Comment by Max Hirschhorn [ 21/Mar/19 ] |
|
Thanks for the confirmation both of you. I put together in order to keep the valid transitions straight in my mind. |
| Comment by Louis Williams [ 21/Mar/19 ] |
|
max.hirschhorn Yes, that is correct. If a node rolls back a "commit" or "abort" back into a prepared state, the sharded transaction protocol guarantees the same command will be sent again; it will not change its mind. If that were not the case, we would need to do some amount of work to make the fastcount logic work differently for rollback. |
| Comment by Judah Schvimer [ 21/Mar/19 ] |
Exactly. |
| Comment by Max Hirschhorn [ 21/Mar/19 ] |
Where does the guarantee that a different outcome won't ever occur on the two branches of history come from? Is it because of how prepared transactions are used by the distributed commit protocol where the coordinator must not change its mind as whether to commit or abort a transaction once it is prepared on all of the shards? Modeling these semantics is important to me for implementing prepared transaction support in the rollback fuzzer. |
| Comment by Louis Williams [ 20/Mar/19 ] |
|
After some discussion, it seems there has been agreement about the desired behavior of fastcount during rollback. The desire is to do no additional work, only confirm we have testing to cover all cases. Keep in mind that we update the fastcount in memory, then rollback if the storage transaction aborts. There are 5 cases to consider:
|
| Comment by Louis Williams [ 18/Mar/19 ] |
|
I don't think this is going to be as simple as just recording op counts while reconstructing prepare oplog entries. I think we'll need first locate the prepare oplog entries, calculate the operation counts per namespace, and then subtract those from each collection's in-memory count. |
| Comment by Judah Schvimer [ 25/Feb/19 ] |
|
Due to |