[SERVER-60595] Resmoke hooks such as ContinuousTenantMigration may not pause even after being paused Created: 11/Oct/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vishnu Kaushik | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Assigned Teams: |
Replication
|
| Operating System: | ALL |
| Participants: |
| Description |
|
The implementation of ContinuousTenantMigration suggests that when we pause after test, we expect the hook to be in a state in which no migrations are going on. This can be violated. Suppose this sequence of events takes place:
There is a sequence of steps in which stop() comes into play once all tests have been completed, which prevents the tenant migration thread from ever terminating. |
| Comments |
| Comment by Vishnu Kaushik [ 12/Oct/21 ] |
|
I attached the reproducer (it's the current tenant_migrations.py with some sleeps / waits where necessary, as well as a log file from a severe failed run, in which the tenant migrations thread is never able to complete. It may take 5 - 6 runs to reproduce the issue in its most severe form. However, the reproducer should reliably make the call to pause() return and then have a tenant migration run immediately after. |
| Comment by Judah Schvimer [ 11/Oct/21 ] |
|
Vishnu found this by code inspection. |