[SERVER-45827] Expand initial sync fuzzer grammar to include all CRUD document shapes and index DDL ops Created: 28/Jan/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication, Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
The initial sync fuzzer suites run a series of randomized operations against the sync source of an initial syncing node in a way that is highly deterministic and reproducible. The diversity of operation types that it runs, however, is low, since it relies on a simplified grammar for generating operations (both CRUD ops and DDL ops). To give us thorough coverage of initial sync bugs that require more complex operation types, we should extend the grammar of the initial sync fuzzer to include all document shapes we can reasonably include in the grammar, and also include create/drop index operations with all index shapes. This should give us more thorough and reproducible coverage of initial sync. Our existing jstestfuzz_replication_initsync suites give us coverage of initial sync with good operation diversity, but those failures can be much harder to reproduce due to the inherent non-determinism of the BackgroundInitialSync test hook. |
| Comments |
| Comment by William Schultz (Inactive) [ 07/Feb/20 ] |
|
After discussion with robert.guo and judah.schvimer, we have determined that running the mutational fuzzer against the initial sync fuzzer test fixture will require too much work for a quick win. Instead, we've decided to extend the existing initial sync fuzzer grammar to include complex document and index shapes. It may also be possible to combine the initial sync and rollback fuzzer grammars. We still think that running mutational fuzzer ops against the initial sync fuzzer suite is valuable, so I have created SERVER-46044 to keep track of that future work. |
| Comment by Robert Guo (Inactive) [ 03/Feb/20 ] |
|
william.schultz I'm not optimistic the grammar fuzzers will work out well. The grammar fuzzers are for enumerating possible known states that a human can think of; the mutational fuzzer is for generating states people do not think of, like One possible grammar-based solution is to write code to mutate the grammar rules to generate new rules. We can then gather coverage data to decide if the new, generated rule is any good. This allows the goal of incrementally increasing the complexity of the grammar over time, as you said. Without coverage info, we won't know if any change to the grammar that knowingly introduces invalid queries does more harm or good. |
| Comment by William Schultz (Inactive) [ 03/Feb/20 ] |
|
robert.guo Even though this ticket's title says that we want to run the mutational fuzzer operations in the initial sync fuzzer, the underlying goal is to sufficiently increase the diversity of operations run in the initial sync fuzzer. Do you think that this could be achieved more easily by taking advantage of our other existing, more advanced grammars? Extending a single grammar file or utilizing other existing grammars seems much easier than integrating the mutational fuzzer with the initial sync fuzzer (based on the difficulties you outlined above). I imagine the complexity of those grammars can also be incrementally increased over time. |
| Comment by Judah Schvimer [ 03/Feb/20 ] |
|
Thanks for the input. I think the value of this proposed work (and maybe a similar thing for the rollback fuzzer) is significantly more valuable than making the background init sync hook more deterministic. I think a lot of the value from this work comes from the initial sync fuzzer's ability to run commands in narrow windows to expose difficult race conditions. This is unrelated to the additional determinism. I propose moving this from a replication quick-win to a dev-prod QP request. Does that seem reasonable? |
| Comment by Robert Guo (Inactive) [ 03/Feb/20 ] |
|
judah.schvimer While the idea in this ticket is sound and valuable, I expect it to be a lot of work. The complexity comes from the need to "control" what the mutational fuzzer does to allow initsync to happen "inline" instead of "in the background". The rough breakdown of work items is as follows: 1. Parse the fuzzer generated file and split it into sub-ASTs that can be individually run in a single initsync fixture step. Since the mongo shell has no proper module support, we need to somehow prevent variables used by the intisync fixture from being overridden. Before trying to make a decision on the above work, I'm curious if we've thought about making background init sync hook more deterministic. Could we maybe use change streams and the failpoint from SERVER-45830 to have the background hook execute a deterministic number of statements between each initsync step? I suspect this approach will be significantly less work if it's feasible |
| Comment by Tess Avitabile (Inactive) [ 03/Feb/20 ] |
|
After we do this and SERVER-45830, we would attempt to remove jstestfuzz_replication_initsync. We could let them both run for a month to make sure they're catching the same bugs. |
| Comment by William Schultz (Inactive) [ 31/Jan/20 ] |
|
Another thought about improving the initial sync fuzzer incrementally: when we fix a particular bug whose reproduction requires certain shapes of operations, we could consider adding these new operation types into the initial sync fuzzer grammar. For example, we worked for a day or two to reproduce the original bug in |
| Comment by William Schultz (Inactive) [ 28/Jan/20 ] |
|
This type of coverage would presumably allow bugs like |