[SERVER-41625] RollbackTest fixture runs data consistency checks twice when it stops Created: 10/Jun/19 Updated: 29/Oct/23 Resolved: 27/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.4 |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | William Schultz (Inactive) | Assignee: | Ryan Timmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Repl 2020-01-27, Repl 2020-02-10 | ||||||||
| Participants: | |||||||||
| Description |
|
The RollbackTest fixture runs replica set data consistency checks explicitly inside its stop() method. We subsequently call stopSet on the underlying ReplSetTest used by the RollbackTest. Since the completion of |
| Comments |
| Comment by Githook User [ 27/Jan/20 ] |
|
Author: {'email': 'ryan.timmons@mongodb.com', 'username': 'rtimmons', 'name': 'Ryan Timmons'}Message: |
| Comment by William Schultz (Inactive) [ 14/Jan/20 ] |
|
ryan.timmons In a standard usage of RollbackTest, where we go through all the normal state transitions, we will do consistency checks 3 times: (1) when we call transitionToSteadyStateOperations(), (2) when we call RollbackTest.stop, and (3) when we call ReplSetTest.stopSet from inside RollbackTest.stop. Ideally, we want to only have 1 set of consistency checks. I think that the consistency checks inside 'ReplSetTest.stopSet' are always redundant, since we directly call 'checkDataConsistency' immediately before. So, we can always skip the stopSet consistency checks. You can do this by setting jsTest.options().skipCheckDBHashes, but it might be simpler to just add another argument to the stopSet function that allows you to bypass consistency checks, so that we don't have to change those global settings for a single function call. To combine checks (1) and (2), I think it would make sense to record whether we already checked data consistency inside 'transitionToSteadyStateOperations()'. We should always call RollbackTest.stop when ending a test, but, as you pointed out, we may not always call transitionToSteadyStateOperations(), and even if we do, we may skip consistency checks there. Once we reach RollbackTest.stop, we can see if consistency checks were already executed previously, inside transitionToSteadyStateOperations(). If they were, then we don't need to do any consistency checks. If they weren't, then we can do one set of consistency checks. To summarize, we would always skip the ReplSetTest.stopSet consistency checks, and only do consistency checks inside RollbackTest.stop if we didn't already do them inside transitionToSteadyStateOperations(). Also note that the 'checkDataConsistency' function is slightly more thorough than the consistency checks inside ReplSetTest.stopSet, since we also explicitly check collection counts there. |
| Comment by William Schultz (Inactive) [ 11/Jun/19 ] |
|
To amend the original description, I believe that we actually do 3 data consistency checks in tests that use the RollbackTest in the standard way. This is because there is another call to RollbackTest.checkDataConsistency() inside transitionToSteadyStateOperations, which is always called when going through the normal rollback test sequence. |
| Comment by William Schultz (Inactive) [ 11/Jun/19 ] |
|
Note that the RollbackTest.checkDataConsistency appears to do one more check than ReplSetTest.stopSet, which is validateNodes. ReplSetTest only calls checkOplogs and checkReplicatedDataHashes. We may want to remove the extra calls to checkOplogs and checkReplicatedDataHashes while leaving the call to validateNodes present in the RollbackTest consistency checks. |
| Comment by William Schultz (Inactive) [ 10/Jun/19 ] |
|
As one example of the slowdown incurred by these extra checks, running the rollback_rename_collection_on_sync_source.js test with the redundant checks in place took 58.75 seconds on my local machine, and 46.83 seconds after removing the redundant check. |