[SERVER-4143] Replication should pause during fsync+lock Created: 25/Oct/11  Updated: 06/Dec/22  Resolved: 22/Feb/18

Status: Closed
Project: Core Server
Component/s: Concurrency, Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mathias Stearn Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-3950 Add a command to stop/restart replica... Closed
depends on SERVER-1423 reads often aren't possible while in ... Closed
is depended on by SERVER-4123 fsync/lock blocks the progress of mon... Closed
Duplicate
is duplicated by SERVER-5454 backups using fsync from secondaries ... Closed
Assigned Teams:
Replication
Participants:

 Description   

While a secondary is locked, replication attempts to get the write lock on the secondary. This causes future readers to block, since the write lock takes precedence, but readers ought to be able to proceed since the writer won't be unblocked soon.

A potential workaround: Instead of blocking on the lock, fsync+lock on a secondary could simply pause the replication process (probably pausing the oplog reader and not take out the lock) and after writes have finished then flush all database files to get a consistent file system state.

More likely workaround: wait for SERVER-1423 to be completed, in which case we'll get the correct behavior for free: readers will proceed even while writers are queued waiting for fsyncUnlock.



 Comments   
Comment by Gregory McKeon (Inactive) [ 22/Feb/18 ]

We believe this was resolved by SERVER-1423, please open a new ticket if it was not.

Comment by Dwight Merriman [ 03/Feb/14 ]

i see this code. does it not work? i wouldn't be shocked as there are perhaps other places in repl where it does things that involve lock acquisition?

    void SyncTail::multiApply( std::deque<BSONObj>& ops, MultiSyncApplyFunc applyFunc ) {
 
        // Use a ThreadPool to prefetch all the operations in a batch.
        prefetchOps(ops);
        
        std::vector< std::vector<BSONObj> > writerVectors(theReplSet->replWriterThreadCount);
        fillWriterVectors(ops, &writerVectors);
        LOG(2) << "replication batch size is " << ops.size() << endl;
        // We must grab this because we're going to grab write locks later.
        // We hold this mutex the entire time we're writing; it doesn't matter
        // because all readers are blocked anyway.
        SimpleMutex::scoped_lock fsynclk(filesLockedFsync);
 
        // stop all readers until we're done
        Lock::ParallelBatchWriterMode pbwm;
 
        applyOps(writerVectors, applyFunc);
    }

Comment by Scott Hernandez (Inactive) [ 25/Jun/13 ]

The current plan for this is to replace the fsync+lock code for a replica (secondary) to simply pause replication and fsync for the lock case and to resume replication for the unlock case (after the lock requests count goes to 0).

In the initial implementation the local database will not be locked for writes so the caller must not manually write data into the local database. If this is problematic we can use a condition variable (condvar) to prevent writes to the local database during this state. Once the locking system supports what is needed the implementation can be reevaluated.

Comment by Dwight Merriman [ 04/Feb/12 ]

when SERVER-1423 is done i think this will automatically start happening

Generated at Thu Feb 08 03:05:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.