[SERVER-45830] Add failpoint to allow InitialSyncTest fixture to pause initial syncing node after cloning some documents Created: 28/Jan/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
The initial sync fuzzer currently pauses initial sync before running the 'listDatabases', 'listCollections', and 'listIndexes', for each database/collection that is being cloned. It does not, however, pause the syncing node at any time during the actual fetching of documents inside the CollectionCloner. This can prevent it from being able to deterministically reproduce certain bugs that may occur during the collection cloning process. For example, if the sync source contains a document {_id: 1}, which is cloned by the initial syncing node, and then the sync source deletes {_id: 1} and re-inserts it before the clone has finished for that collection, the syncing node may clone the document a second time. Being able to deterministically reproduce cases like this would be a helpful improvement to our initial sync test infrastructure. |
| Comments |
| Comment by William Schultz (Inactive) [ 31/Jan/20 ] |
|
judah.schvimer It's not strictly required for us to be able to catch the bug described in |
| Comment by Judah Schvimer [ 31/Jan/20 ] |
|
william.schultz, is this ticket required for the initial sync fuzzer to catch bugs like |
| Comment by William Schultz (Inactive) [ 28/Jan/20 ] |
|
Implementing both this change and SERVER-45827 would hopefully allow the initial sync fuzzer to give us both the operational diversity of the existing jstestfuzz_replication_initsync suites and a high degree of deterministic reproducibility. samy.lanka noted that controlling the index building process in initial sync could be another source of potential non-determinism, but I believe that for bugs like the one described in |
| Comment by William Schultz (Inactive) [ 28/Jan/20 ] |
|
This improvement should allow the initial sync fuzzer to deterministically reproduce bugs like number (2) mentioned here in |