During a migration, if waiting for replication times out, abort the migration without entering the critical section

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Duplicate
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.2.2, 2.3.1
    • Component/s: Sharding
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Currently, after the main data transfer for a migration we block to wait for all the writes from the migration to be replicated to a majority of nodes, before we enter the critical section. We wait for 10 hours, but if after 10 hours the writes still haven't been replicated, we continue anyway and enter the critical section. In this case, it is very likely that the migration will abort shortly after the critical section writes happens and we wait 30 seconds for those writes to be replicated. Entering the critical section can block all read and write operations for up to 30 seconds, so we should avoid entering it all when it's so likely that we'll abort.

            Assignee:
            Randolph Tan
            Reporter:
            Spencer Brody (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: