[SERVER-42801] Create test coverage to verify data consistency if node crashes while in different stages of committing single-RS transaction Created: 13/Aug/19 Updated: 27/Oct/23 Resolved: 14/Aug/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | Backlog - Replication Team |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Replication
|
| Participants: |
| Description |
|
Came out of discussions with siyuan.zhou about what happens to transaction data if a node crashes after the WriteUnitOfWork commits but before the oplog gets written. If this coverage already exists, feel free to close as duplicate. |
| Comments |
| Comment by Siyuan Zhou [ 14/Aug/19 ] |
|
Agreed, they should. |
| Comment by Judah Schvimer [ 14/Aug/19 ] |
|
Do you agree that kill_primary and kill_secondary and rollback_fuzzer unclean shutdown suites cover this? |
| Comment by Siyuan Zhou [ 14/Aug/19 ] |
|
The original crash is caused by an exception thrown after the WriteUnitOfWork commits but before the oplog gets written. Since the code doesn't allow exceptions, it crashes. blake.oler wanted to make sure even if the node crashes at that time, the data isn't corrupted. I believe that's guaranteed by the fact we are still holding the oplog slot, so the stable timestamp hasn't advanced to included the newly committed data yet. On startup recovery, the "committed" data will not be reflected by the stable checkpoint. This is basically to test the atomicity of writing data and writing oplog entry by holding oplog slot. If we already have tests that can cover this property indirectly, we can close the ticket. |
| Comment by Judah Schvimer [ 13/Aug/19 ] |
|
I think that this is covered by our kill_primary and kill_secondary and rollback_fuzzer unclean shutdown suites. siyuan.zhou, am I missing something? |