[SERVER-4143] Replication should pause during fsync+lock Created: 25/Oct/11 Updated: 06/Dec/22 Resolved: 22/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency, Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | sync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
While a secondary is locked, replication attempts to get the write lock on the secondary. This causes future readers to block, since the write lock takes precedence, but readers ought to be able to proceed since the writer won't be unblocked soon. A potential workaround: Instead of blocking on the lock, fsync+lock on a secondary could simply pause the replication process (probably pausing the oplog reader and not take out the lock) and after writes have finished then flush all database files to get a consistent file system state. More likely workaround: wait for |
| Comments |
| Comment by Gregory McKeon (Inactive) [ 22/Feb/18 ] | ||||||||||||||||||
|
We believe this was resolved by | ||||||||||||||||||
| Comment by Dwight Merriman [ 03/Feb/14 ] | ||||||||||||||||||
|
i see this code. does it not work? i wouldn't be shocked as there are perhaps other places in repl where it does things that involve lock acquisition?
| ||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 25/Jun/13 ] | ||||||||||||||||||
|
The current plan for this is to replace the fsync+lock code for a replica (secondary) to simply pause replication and fsync for the lock case and to resume replication for the unlock case (after the lock requests count goes to 0). In the initial implementation the local database will not be locked for writes so the caller must not manually write data into the local database. If this is problematic we can use a condition variable (condvar) to prevent writes to the local database during this state. Once the locking system supports what is needed the implementation can be reevaluated. | ||||||||||||||||||
| Comment by Dwight Merriman [ 04/Feb/12 ] | ||||||||||||||||||
|
when |