[SERVER-8080] During a migration, if waiting for replication times out, abort the migration without entering the critical section Created: 04/Jan/13  Updated: 10/Dec/14  Resolved: 29/Aug/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.2.2, 2.3.1
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Randolph Tan
Resolution: Duplicate Votes: 0
Labels: revisit
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-13456 MigrateStatus::_go is unkillable when... Closed
Related
Participants:

 Description   

Currently, after the main data transfer for a migration we block to wait for all the writes from the migration to be replicated to a majority of nodes, before we enter the critical section. We wait for 10 hours, but if after 10 hours the writes still haven't been replicated, we continue anyway and enter the critical section. In this case, it is very likely that the migration will abort shortly after the critical section writes happens and we wait 30 seconds for those writes to be replicated. Entering the critical section can block all read and write operations for up to 30 seconds, so we should avoid entering it all when it's so likely that we'll abort.



 Comments   
Comment by Spencer Brody (Inactive) [ 04/Jan/13 ]

We should also consider lowering the timeout from 10 hours.

Generated at Thu Feb 08 03:16:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.