[SERVER-36751] Prevent concurrent dropDatabase commands in the concurrency_simultaneous_replication suite Created: 17/Aug/18 Updated: 29/Oct/23 Resolved: 26/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.7 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Robert Guo (Inactive) | Assignee: | Max Hirschhorn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | tig-bfday-eligible, tig-concurrency | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||
| Sprint: | STM 2018-12-31 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Story Points: | 1 | ||||||||
| Description |
|
Problem Our tests will retry an operation for up to 10 minutes if DatabaseDropPending errors are encountered. After seeing the error, A getLastError command is used to wait for the dropDatabase command to be committed. There is variability in the order that getLastError returns from different workload clients, which may cause certain workload clients to always be stuck behind other clients that are doing more dropDatabase commands. When this happens, the client will receive another DatabaseDropPending error. But the client is unable to distinguish whether the error is caused by the same dropDatabase command or a new one, causing the new wait to continue eat into the 10 minute timeout. There is a small probability that this cycle will happen for a handful of times in a row, which when combined with slow multi-minute dropDatabase commands, will exceed the 10 minute timeout. Solution When the database is finally dropped, it's guaranteed that none of the clients waiting on it would be another drop database, so they should all be able to proceed. There might be edge cases where one client is able to execute multiple commands and one of those commands is another dropDatabase, but the likelihood of this happening 5 times in a row should be much smaller if not negligible. From a correctness perspective, this change will make some dropDatabase implicitly into no-ops, which should not cause loss of test coverage, as databases can't be dropped in parallel in the first place. The tests that run parallel dropDatabases also all randomized tests and don't expect these operations to all succeed when there are parallel clients operating on the same database.
The |
| Comments |
| Comment by Githook User [ 26/Dec/18 ] |
|
Author: {'username': 'visemet', 'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn'}Message: |
| Comment by Max Hirschhorn [ 24/Dec/18 ] |
I don't think such a test case is going to be practical to run in Evergreen on a continuous basis. |
| Comment by Max Hirschhorn [ 30/Aug/18 ] |
|
We can put the new behavior behind a TestData option that's only enabled for the concurrency framework if there are other users of the implicitly_retry_on_database_drop_pending.js override file outside of the concurrency framework. |