In server_status_metrics.js, we test that the number of batches that the secondary has received is equal to the number of getMore commands processed by the primary. For both the primary and the secondary, we record the start and end number, and assert that the number of batches that were processed between the start and the end are equal. We record getMores on the primary in curop and on the secondary in OplogFetcher.
When recording the start/end numbers, we are able to make a clean cut on the primary by using the planExecutorHangBeforeShouldWaitForInserts. However, it seems like there could be a race on the secondary if we are using exhaust cursors. Since exhaust cursors don't wait for a batch to be received by the secondary, it is difficult to know when to make a clean cut on the secondary. Unfortunately, it is not possible to wait in the JS test for the number of getMores on the secondary to hit the same number as the primary, because the real number might differ significantly on the primary vs secondary.
Some potential solutions:
- Delete the test
- This test case seems to very strictly test that the metrics are correct. In reality, we probably don't need metrics to be this precise
- However, it is possible for these metrics to be broken in the future, and it might not be noticed without this test
- Modify the test to not use exhaust cursors
- Not using exhaust cursors should fix this test, since the primary would not be able to reach a clean cut until the secondary has received the batch before the clean cut point.
- However, this will lead to less coverage of exhaust cursors. This is probably fine, since these metrics aren't directly related to exhaust cursors.
- Add in a hard sleep in JS test
- This will not be robust to slow machines, so probably not the best solution