[SERVER-69676] Improve serverless_reject_multiple_ops and reenable test Created: 14/Sep/22  Updated: 29/Sep/22  Resolved: 29/Sep/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Matt Broadstone Assignee: Didier Nadeau
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Server Serverless 2022-10-03
Participants:

 Description   

jstests/serverless/serverless_reject_multiple_ops.js has an average runtime of around 7 minutes, and causes timeouts when run within the serverless suite. We should investigate how to reduce the runtime either by splitting the test cases into multiple files, or trying to reuse test fixtures rather than creating new ones for each case.



 Comments   
Comment by Didier Nadeau [ 29/Sep/22 ]

As SERVER-65315 was reverted and the PR is being reworked to before merging it again, improvements to the serverless tests are going to be part of SERVER-65315.

Comment by Didier Nadeau [ 20/Sep/22 ]

I'm writing test for the following case :

  • Operation starts and fails due to serverless lock
  • Lock is released
  • Operation 2 is started, expected to fail as operation 1 isn't garbage collectable yet
  • Calling forget on operation 1. Expect it to succeed.
  • Operation 3 is started, expected to succeed as operation 1 is aborted and garbage collectable.

This does not work for TenantMigrationDonorService (it's the error highlighted above). It does not work either for split due to the following :

  • forgetShardSplit succeeds and marks the future
  • checkIfConflictsWithOtherInstances uses `isGarbageCollectable` and `getStateDocState()`
    • As the shard split document is never created, these methods returns uninitialized values (they query _stateDoc).
Comment by Matt Broadstone [ 19/Sep/22 ]

Thanks for the update. The first issue makes sense to me, I think we probably just need to not reuse these replica sets for these tests for the time being. The second issue sounds like the ServerlessOperationLock is not working correctly for TMDS (and maybe also split, given the BFs we were seeing last week). Let's discuss more on the open PR, and see if we can add tests around this behavior.

Generated at Thu Feb 08 06:14:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.