[SERVER-43060] CheckReplDBHashInBackground should retry the "dbHash" command on WriteConflicts Created: 27/Aug/19 Updated: 29/Oct/23 Resolved: 09/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc14, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||
| Sprint: | Execution Team 2020-07-13 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 18 | ||||||||||||||||
| Description |
|
The "dbHash" command, by using a long-running snapshot read with $_internalReadAtClusterTime, has the tendency to induce cache pressure. When application threads are performing eviction and the WiredTiger cache gets stuck, WiredTiger will abort the oldest transaction in order to make progress. This error manifests as a WT_ROLLBACK error code, and then MongoDB converts this to a WriteConflictException. For that reason, it would be expected for dbHash to get WriteConflicts occasionally, and we should have logic to retry or ignore failures in this case. |
| Comments |
| Comment by Githook User [ 17/Jul/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: (cherry picked from commit ab4d803f1f2f57cf9dbec89175f8eb52cb4761f2) |
| Comment by Githook User [ 09/Jul/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |
| Comment by Louis Williams [ 15/Jun/20 ] |
|
Reassigning to Execution because we have the most context on how to fix this. |
| Comment by Brooke Miller [ 15/Jun/20 ] |
|
louis.williams, is this something that Execution team could do, since you have a better understanding of the best retry strategy for avoiding WriteConflict exceptions? |