[SERVER-33042] Add test coverage for tailing oplog on secondary failing with CappedPositionLost Created: 31/Jan/18  Updated: 06/Dec/22  Resolved: 17/Jan/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Backlog - Storage Execution Team
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-32606 Tailing oplog on secondary fails with... Closed
is related to SERVER-32883 Enhanced FSM testing for reading from... Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

SERVER-32606 was committed without test coverage. There is a js repro attached to that ticket that reproduces the issue most (but not all) of the time. We should get that repro cleaned up and committed so it can run in evergreen



 Comments   
Comment by Bruce Lucas (Inactive) [ 31/Jan/18 ]

There original repro on that ticket that is reliable and quick, possibly because it does things in a different order from the js repro (as noted). It might be worth seeing if the js repro can be made reliable by emulating the original repro.

Comment by Spencer Brody (Inactive) [ 31/Jan/18 ]

The reason why SERVER-32606 was committed without coverage is that the repro attached is non-deterministic. It triggers the issue most of the time, but not every time, and it needs to run for a while before triggering it. geert.bosch spent some time trying to use failpoints to make the test deterministic, but was not successful. I spoke with max.hirschhorn about this, and he's okay with the idea of pushing tests that only trigger the issue some of the time. He suggests creating a version of the repro that runs for a fixed number of operations (which has been demonstrated to repro the issue with some regularity) then terminates, and adding it to the noPassthrough suite. It would be important to ensure that the test doesn't fail for unrelated reasons (such as oplog rollover).

Generated at Thu Feb 08 04:32:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.