[SERVER-51057] Test that getMore metrics are incremented in server_status_metrics.js Created: 18/Sep/20  Updated: 29/Oct/23  Resolved: 28/Sep/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.8.0, 4.4.3

Type: Bug Priority: Major - P3
Reporter: Xuerui Fa Assignee: Xuerui Fa
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.7, v4.4
Sprint: Repl 2020-10-05
Participants:
Linked BF Score: 22

 Description   

In server_status_metrics.js, we test that the number of batches that the secondary has received is equal to the number of getMore commands processed by the primary. For both the primary and the secondary, we record the start and end number, and assert that the number of batches that were processed between the start and the end are equal. We record getMores on the primary in curop and on the secondary in OplogFetcher.

When recording the start/end numbers, we are able to make a clean cut on the primary by using the planExecutorHangBeforeShouldWaitForInserts. However, it seems like there could be a race on the secondary if we are using exhaust cursors. Since exhaust cursors don't wait for a batch to be received by the secondary, it is difficult to know when to make a clean cut on the secondary. Unfortunately, it is not possible to wait in the JS test for the number of getMores on the secondary to hit the same number as the primary, because the real number might differ significantly on the primary vs secondary.

Some potential solutions:

  1. Delete the test
    • This test case seems to very strictly test that the metrics are correct. In reality, we probably don't need metrics to be this precise
    • However, it is possible for these metrics to be broken in the future, and it might not be noticed without this test
  2. Modify the test to not use exhaust cursors
    • Not using exhaust cursors should fix this test, since the primary would not be able to reach a clean cut until the secondary has received the batch before the clean cut point.
    • However, this will lead to less coverage of exhaust cursors. This is probably fine, since these metrics aren't directly related to exhaust cursors.
  3. Add in a hard sleep in JS test
    • This will not be robust to slow machines, so probably not the best solution


 Comments   
Comment by Githook User [ 30/Nov/20 ]

Author:

{'name': 'XueruiFa', 'email': 'xuerui.fa@mongodb.com', 'username': 'XueruiFa'}

Message: SERVER-51057: Test that getMore metrics are incremented in server_status_metrics.js

(cherry picked from commit 2cb22c8caaf2be4025d93e5fb75afc8e4be3287e)
Branch: v4.4
https://github.com/mongodb/mongo/commit/b40d4b7eaf6239f83b0d6fbcf27de520bfd89df4

Comment by Githook User [ 28/Sep/20 ]

Author:

{'name': 'XueruiFa', 'email': 'xuerui.fa@mongodb.com', 'username': 'XueruiFa'}

Message: SERVER-51057: Test that getMore metrics are incremented in server_status_metrics.js
Branch: master
https://github.com/mongodb/mongo/commit/2cb22c8caaf2be4025d93e5fb75afc8e4be3287e

Comment by Xuerui Fa [ 24/Sep/20 ]

In triage, we decided to assert that the metrics is increasing on the primary and secondary, instead of asserting that the number of batches processed is strictly equal.

CC lingzhi.deng

Generated at Thu Feb 08 05:24:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.