Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-72254

Chunk Migration should fail immediately when session migration fails.

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Cluster Scalability
    • 3

      Migration destination manager on the recipient starts fetching session information at the beginning of the move chunk process. This fetch happens on a separate thread. If SessionCatalogMigrationDestination fails due to any issues (e.g. Operation Interrupted) then we record the failure but we do not abort the chunk migration.

       

      MigrationDestinationManager does eventually check the status of Session Migration and fails if the status is ErrorOccurred but this check is not done until the very end of chunk migration. So chunk migration won’t immediately fail even if session migration has failed.

      This can cause an issue where a Chunk Migration can get stuck for 6 hours (timeout) because one of the conditions for the donor to engage the critical section is that session migration succeeded so the donor will keep waiting for 6 hours for the recipient to finish session migration while the recipient is waiting on the donor to engage the critical section. The donor will keep retrying until it times out in 6 hours.

            Assignee:
            backlog-server-cluster-scalability [DO NOT USE] Backlog - Cluster Scalability
            Reporter:
            kshitij.gupta@mongodb.com Kshitij Gupta
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: