[SERVER-76358] Retry drop database cleanup attempts in tenant migration test hook Created: 20/Apr/23  Updated: 29/Oct/23  Resolved: 26/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.0-rc1

Type: Bug Priority: Major - P3
Reporter: Christopher Caplinger Assignee: Christopher Caplinger
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0
Participants:
Linked BF Score: 35

 Description   

Before the hook starts a new tenant migration, we drop recipient databases. In this particular case, a failover happened while trying to do so, the test failed with InterruptedDueToReplStateChange we should retry these operations to avoid such failures in the future.



 Comments   
Comment by Githook User [ 27/Apr/23 ]

Author:

{'name': 'Christopher Caplinger', 'email': 'christopher.caplinger@mongodb.com', 'username': 'UnicodeSnowman'}

Message: SERVER-76358: Retry drop_database cmds in Tenant Migration hook
Branch: v7.0
https://github.com/mongodb/mongo/commit/67cd7b2230cdfec0ff94bf91af71f85ef4483395

Comment by Githook User [ 26/Apr/23 ]

Author:

{'name': 'Christopher Caplinger', 'email': 'christopher.caplinger@mongodb.com', 'username': 'UnicodeSnowman'}

Message: SERVER-76358: Retry drop_database cmds in Tenant Migration hook
Branch: master
https://github.com/mongodb/mongo/commit/518f3df1276fa9c396b1384554e69dd96e633b6c

Comment by Matt Broadstone [ 21/Apr/23 ]

Sorry for leading you astray. I think that means you can use with_naive_retry though, you'll just need to see if any error dropDatabase produces will escape the loop prematurely. (Remember that one of the errors with_naive_retry retries on is InterruptedDueToReplStateChange)

Comment by Christopher Caplinger [ 21/Apr/23 ]

matt.broadstone@mongodb.com hmm actually it looks like we'll need to go in a different direction for this (at least for dropDatabase). Had to try this out myself to confirm, but it doesn't appear that dropDatabase is actually a retryable write: txnNumber may only be provided for multi-document transactions and retryable write commands. autocommit:false was not provided, and dropDatabase is not a retryable write command.

we might have to just explicitly retry here for InterruptedDueToReplStateChange errors.

Comment by Matt Broadstone [ 21/Apr/23 ]

christopher.caplinger@mongodb.com I think it might make sense as part of this ticket to consider using a pymongo.timeout scope in all the shard split/tenant migration python fixture commands I didn't wrap with with_naive_retry in SERVER-73129.

Generated at Thu Feb 08 06:32:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.