[SERVER-8080] During a migration, if waiting for replication times out, abort the migration without entering the critical section Created: 04/Jan/13 Updated: 10/Dec/14 Resolved: 29/Aug/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.2.2, 2.3.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Randolph Tan |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | revisit | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Currently, after the main data transfer for a migration we block to wait for all the writes from the migration to be replicated to a majority of nodes, before we enter the critical section. We wait for 10 hours, but if after 10 hours the writes still haven't been replicated, we continue anyway and enter the critical section. In this case, it is very likely that the migration will abort shortly after the critical section writes happens and we wait 30 seconds for those writes to be replicated. Entering the critical section can block all read and write operations for up to 30 seconds, so we should avoid entering it all when it's so likely that we'll abort. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 04/Jan/13 ] |
|
We should also consider lowering the timeout from 10 hours. |