Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60595

Resmoke hooks such as ContinuousTenantMigration may not pause even after being paused

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • ALL

      The implementation of ContinuousTenantMigration suggests that when we pause after test, we expect the hook to be in a state in which no migrations are going on. This can be violated.

      Suppose this sequence of events takes place:

      • The tenant migrations thread is started, and it pauses here before self._is_idle_evt.clear(). It has already checked to make sure a tenant migration is permitted.
      • The main resmoke thread of execution is done with the test and attempts to pause the thread. Marking the test as finished in pause() is irrelevant now, since the tenant migrations thread has already run past the wait_for_tenant_migration_permitted().
      • Since the tenant migrations thread has not performed self._is_idle_evt.clear() yet, this check in pause() succeeds, and we think we have finished pausing the tenant migrations thread.
      • However, the tenant migrations thread is free to proceed and does not know it should pause.

      There is a sequence of steps in which stop() comes into play once all tests have been completed, which prevents the tenant migration thread from ever terminating.

        1. server50959repro.log
          2.08 MB
        2. tmrepro.py
          32 kB

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            vishnu.kaushik@mongodb.com Vishnu Kaushik
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: