[SERVER-31370] Add failpoints to rs_rollback.cpp to allow testing of concurrent ops on sync source Created: 03/Oct/17  Updated: 27/Oct/23  Resolved: 29/May/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Backlog - Replication Team
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-31371 Add rollback Javascript test that per... Closed
Assigned Teams:
Replication
Participants:

 Description   

We should add failpoints to rs_rollback.cpp that can allow us to more deterministically test rollbacks scenarios where operations are happening on the sync source node during rollback. There are two failpoints that should provide a good level of control.

  1. Provide ability to hang at specified doc refetch progress. This failpoint can take as an argument a percentage (between 0 and 1), that will determine when in the document refetching process rollback will hang. For example, specifying a hang point of 0.5, and with 100 documents to refetch, rollback would hang after the 50th refetch, allowing us to apply operations on the sync source while rollback is paused.
  2. Provide ability to hang before re-syncing collection metadata. This will provide ability to apply more operations to the sync source after we have already updated minValid once, and after we have refetched docs.

These two fail points should provide a suitable level of control, and help to make tests more reproducible.



 Comments   
Comment by Spencer Brody (Inactive) [ 29/May/18 ]

This isn't a concern for rollback via WT checkpoint, only for rollback via refetch which is going away.

Generated at Thu Feb 08 04:26:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.