[SERVER-36988] awaitdata_getmore_cmd.js times out when run concurrently with the LogicalSessionCache refresh suite Created: 04/Sep/18  Updated: 29/Oct/23  Resolved: 26/Sep/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.1.2
Fix Version/s: 3.6.9, 4.0.4, 4.1.4

Type: Bug Priority: Major - P3
Reporter: Blake Oler Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Sharding 2018-10-08
Participants:

 Description   

awaitdata_getmore_cmd.js consistently stalls when run concurrently with a LogicalSessionCache refresh of 100ms.

Further investigation to follow. This is a placeholder ticket to track the fix for a future blacklist on LogicalSessionCache refresh suites.

Investigation

A couple things are happening here:

  1. awaitdata_getmore_cmd.js tails the oplog and issues getMores against it. When it does so, it waits until the cursor's batch size is equal to zero.
  2. CheckReplDBHashInBackground continually runs and creates sessions to check db hashes.
  3. The logical session cache refresh will flush these sessions to disk, creating new oplog entries.
  4. The getMore batch size will never be equal to zero in more aggressive (faster refresh – 100ms is fastest) logical session cache suites.
  5. The while loop will run indefinitely until the rare condition where the cursor is able to pull an empty batch before CheckReplDBHashInBackground can add a new session.

Proposed Fix

The test expects an exact number. Relaxing the constraints would risk correctness on other suites. Unless it would make sense to conditionally run the while loop only if the logical session cache isn't running, then we should blacklist this test from logical session cache suites.



 Comments   
Comment by Githook User [ 26/Sep/18 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-36988 Blacklist awaitdata_getmore_cmd.js from logical session cache suites
Branch: master
https://github.com/mongodb/mongo/commit/e983b5097788f02d3ef0b187d48d086d853c8769

Comment by Misha Tyulenev [ 25/Sep/18 ]

blake.oler Agreed. Interesting cycle!

Comment by Blake Oler [ 25/Sep/18 ]

misha.tyulenev ack?

Generated at Thu Feb 08 04:44:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.