[SERVER-48010] Substitute ghost timestamp with no-op write in multi-statement txn multikey sidetxn write Created: 07/May/20 Updated: 29/Oct/23 Resolved: 12/May/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc7, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2020-05-18 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 50 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
Because (I believe) we only perform this ghost timestamp on primaries, it's preferable to instead write a no-op oplog entry. That way the stable timestamp never races with the transaction being committed. Durable history may have different consequences due to these races. |
| Comments |
| Comment by Githook User [ 18/May/20 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}Message: (cherry picked from commit 1417eee440b4132e24d1388011d681e2c9fcec41) |
| Comment by Githook User [ 12/May/20 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}Message: |
| Comment by Judah Schvimer [ 07/May/20 ] |
|
daniel.gottlieb pointed me to this area of code as the problematic section. |
| Comment by Daniel Gottlieb (Inactive) [ 07/May/20 ] |
|
tess.avitabile, a reliable reproducer for the linked test failure (assuming this even is what caused the test failure) is difficult because the conditions for a failure:
I've filed |
| Comment by Daniel Gottlieb (Inactive) [ 07/May/20 ] |
|
I linked a BF that I believe is due to this, though I haven't added to the comments yet. Like most cache overflow bugs, there's no great advice for reproducing. We do have other BFs with similar writes (i.e: capped collections deleting in a non-timestamped side-transaction) causing data inconsistency (deleted records coming back to life). WT is aware of these timestamp orderings being problematic (our usage breaks some of their assumptions) and should have them all fixed for the release. However, I believe it's preferable to depend on the WT contract (and not the pre-durable history implementation), particularly for code changes that are anticipated to be simple. |
| Comment by Tess Avitabile (Inactive) [ 07/May/20 ] |
|
daniel.gottlieb, can you give us any advice on how to reproduce this? |
| Comment by Daniel Gottlieb (Inactive) [ 07/May/20 ] |
|
For 4.2, I would classify this as an improvement, for 4.4 I would classify this as a bug. The impact of this in master is data loss. Obviously there's an effort for that not to be the case, but the usage (particularly in this primary only case) adds a lot more complexity for WT than the simplicity of doing a ghost timestamp vs no-op oplog entry saves us. |
| Comment by Judah Schvimer [ 07/May/20 ] |
|
daniel.gottlieb, is this a bug or an improvement? If it's a bug, what's the impact of this? |