[SERVER-33589] Create an initial sync fuzzer suite Created: 01/Mar/18  Updated: 29/Oct/23  Resolved: 10/Jan/19

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.1.7, 4.0.11

Type: Improvement Priority: Minor - P4
Reporter: William Schultz (Inactive) Assignee: Samyukta Lanka
Resolution: Fixed Votes: 0
Labels: prepare_testing
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0, v3.6
Sprint: Repl 2018-11-19, Repl 2018-12-03, Repl 2018-12-17, Repl 2019-01-14
Participants:

 Description   

We could create an initial sync fuzzer suite that is similar in nature to the rollback fuzzer suite. The rollback fuzzer suite has done a good job at exposing subtle bugs in rollback in a relatively deterministic and reproducible way. We do have the jstestfuzz_replication_initsync and initial sync passthrough suites, but a more targeted fuzzer may be better at exposing initial sync bugs more quickly and in more reproducible ways.

We already have a grammar defined that is able to generate operations of all types to run against the database. We could add a suite for initial sync that verifies we are able to successfully clone data and reach data consistency for any sequence of operations executed against the sync source. Using fail points in the initial sync process would allow us to parameterize each test execution on which ops to apply, which documents/collections to clone, etc. If we end up investing time in making initial sync more robust, this could be a good addition to our test infrastructure, and one that shouldn't require too much additional work, since a fair amount of the test infrastructure already exists.

We'll initially only have document modifications and transactions, and look to expand upon this as part of future initial sync work.



 Comments   
Comment by Githook User [ 08/Jul/19 ]

Author:

{'name': 'Samy Lanka', 'username': 'lankas', 'email': 'samy.lanka@mongodb.com'}

Message: SERVER-33589 Create an initial sync test fixture

(cherry picked from commit dc252fd5e493581a58c80c5875503aa7ad147614)
Branch: v4.0
https://github.com/mongodb/mongo/commit/649a497253bb9d2c81a67209319a0846de1a98f6

Comment by Githook User [ 10/Jan/19 ]

Author:

{'username': 'lankas', 'email': 'samy.lanka@mongodb.com', 'name': 'Samy Lanka'}

Message: SERVER-33589 Create an initial sync test fixture
Branch: master
https://github.com/mongodb/mongo/commit/dc252fd5e493581a58c80c5875503aa7ad147614

Comment by Judah Schvimer [ 26/Oct/18 ]

Replication will add two fail points to scheduleRemoteCommand. One the "initial sync fixture" will turn on, and then the fuzzer will notice when it's hit, run some commands on the sync source, and then turn on a second fail point that's immediately after the first in the code, and then turn off the first. The fixture will then see that the second fail point gets hit, turn on the first, and turn off the second. We'll have to clear our ramLog between each occurrence. We will whitelist "find, getMore, listCollections, listIndexes, listDatabases" so that we don't change data non-deterministically on heartbeats. We'll need to make sure that all of the remote calls in initial sync do in fact hit this fail point.

Replication will make an "initial sync fixture" that has a 2 node set. It'll have one function that restarts the initial syncing node without its data and one "step" function that does the failpoint dance above. "step" will check if initial sync is done, and if so validate and dbhash the data. We'll have to abort any active prepared transactions or change the validate command to not conflict. If "step" is called and an initial sync is not active, it'll call restart. We want to make sure that we give the fuzzer a chance to populate the source node with an initial set of data before we begin a new initial sync.

STM will make the actual fuzzer that coordinates with this fixture.

Later on we can consider adding a variant that's a 3 node set and the initial sync happens off of a secondary.

max.hirschhorn, anything to add?

Comment by Gregory McKeon (Inactive) [ 06/Mar/18 ]

max.hirschhorn We likely won't get to this for a while - if the TIG team has interest, feel free to grab this!

Generated at Thu Feb 08 04:33:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.