Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.8.0, 4.4.3
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.7, v4.4
Sprint:
Repl 2020-10-05
Linked BF Score:
22
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In server_status_metrics.js, we test that the number of batches that the secondary has received is equal to the number of getMore commands processed by the primary. For both the primary and the secondary, we record the start and end number, and assert that the number of batches that were processed between the start and the end are equal. We record getMores on the primary in curop and on the secondary in OplogFetcher.

When recording the start/end numbers, we are able to make a clean cut on the primary by using the planExecutorHangBeforeShouldWaitForInserts. However, it seems like there could be a race on the secondary if we are using exhaust cursors. Since exhaust cursors don't wait for a batch to be received by the secondary, it is difficult to know when to make a clean cut on the secondary. Unfortunately, it is not possible to wait in the JS test for the number of getMores on the secondary to hit the same number as the primary, because the real number might differ significantly on the primary vs secondary.

Some potential solutions:

Delete the test
- This test case seems to very strictly test that the metrics are correct. In reality, we probably don't need metrics to be this precise
- However, it is possible for these metrics to be broken in the future, and it might not be noticed without this test
Modify the test to not use exhaust cursors
- Not using exhaust cursors should fix this test, since the primary would not be able to reach a clean cut until the secondary has received the batch before the clean cut point.
- However, this will lead to less coverage of exhaust cursors. This is probably fine, since these metrics aren't directly related to exhaust cursors.
Add in a hard sleep in JS test
- This will not be robust to slow machines, so probably not the best solution

Assignee:: Xuerui Fa
Reporter:: Xuerui Fa
Participants:: Githook User, Xuerui Fa
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Sep 18 2020 07:08:17 PM UTC
Updated:: Oct 29 2023 10:02:57 PM UTC
Resolved:: Sep 28 2020 07:11:40 PM UTC
Confidence Status Last Update:: 25/Sep/20 5:19 PM

Details

Description

Attachments

Forms

Activity

People

Dates