[SERVER-76358] Retry drop database cleanup attempts in tenant migration test hook Created: 20/Apr/23 Updated: 29/Oct/23 Resolved: 26/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0, 7.0.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Christopher Caplinger | Assignee: | Christopher Caplinger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v7.0
|
||||||||
| Participants: | |||||||||
| Linked BF Score: | 35 | ||||||||
| Description |
|
Before the hook starts a new tenant migration, we drop recipient databases. In this particular case, a failover happened while trying to do so, the test failed with InterruptedDueToReplStateChange we should retry these operations to avoid such failures in the future. |
| Comments |
| Comment by Githook User [ 27/Apr/23 ] |
|
Author: {'name': 'Christopher Caplinger', 'email': 'christopher.caplinger@mongodb.com', 'username': 'UnicodeSnowman'}Message: |
| Comment by Githook User [ 26/Apr/23 ] |
|
Author: {'name': 'Christopher Caplinger', 'email': 'christopher.caplinger@mongodb.com', 'username': 'UnicodeSnowman'}Message: |
| Comment by Matt Broadstone [ 21/Apr/23 ] |
|
Sorry for leading you astray. I think that means you can use with_naive_retry though, you'll just need to see if any error dropDatabase produces will escape the loop prematurely. (Remember that one of the errors with_naive_retry retries on is InterruptedDueToReplStateChange) |
| Comment by Christopher Caplinger [ 21/Apr/23 ] |
|
matt.broadstone@mongodb.com hmm actually it looks like we'll need to go in a different direction for this (at least for dropDatabase). Had to try this out myself to confirm, but it doesn't appear that dropDatabase is actually a retryable write: txnNumber may only be provided for multi-document transactions and retryable write commands. autocommit:false was not provided, and dropDatabase is not a retryable write command. we might have to just explicitly retry here for InterruptedDueToReplStateChange errors. |
| Comment by Matt Broadstone [ 21/Apr/23 ] |
|
christopher.caplinger@mongodb.com I think it might make sense as part of this ticket to consider using a pymongo.timeout scope in all the shard split/tenant migration python fixture commands I didn't wrap with with_naive_retry in |