[SERVER-6672] slaveDelay Setting Causes Replica Ops to be Applied in Batches at approximately the slaveDelay Interval Created: 31/Jul/12  Updated: 11/Jul/16  Resolved: 27/Aug/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.2.0-rc0
Fix Version/s: 2.2.1, 2.3.0

Type: Bug Priority: Major - P3
Reporter: Adam Comerford Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux, 2.2.0-rc0 - Sharded, 11 clusters


Attachments: File delay.js     File load_gen.js    
Issue Links:
Depends
Operating System: ALL
Participants:

 Description   

When a slave delay is specified ("slaveDelay": 7200 in the original case), the replication ops are applied in batches at approximately the 7200 second interval. As a result, there are massive write spikes (insert/updates), lock percentage spikes, disk IO spikes and DR102 errors caused



 Comments   
Comment by auto [ 12/Sep/12 ]

Author:

{u'date': u'2012-08-23T13:45:30-07:00', u'email': u'randolph@10gen.com', u'name': u'Randolph Tan'}

Message: Fix for SERVER-6672:

Added logic in the oplog application batching algorithm to end the batch early if the we see an op that is too new to be applied with respect to the slaveDelay.
Branch: v2.2
https://github.com/mongodb/mongo/commit/2f80a7b181b124a544549758e182988df932a542

Comment by auto [ 27/Aug/12 ]

Author:

{u'date': u'2012-08-23T13:45:30-07:00', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: Fix for SERVER-6672:

Added logic in the oplog application batching algorithm to end the batch early if the we see an op that is too new to be applied with respect to the slaveDelay.
Branch: master
https://github.com/mongodb/mongo/commit/a18b9d4cddee21e5df2ea65c4e0d4215e61228fe

Comment by Randolph Tan [ 23/Aug/12 ]

Results for running the test scripts in my local machine before fix:

Delay hovers around 26~47 sec
Lots of "[rsSync] warning: DR102 too much data written uncommitted" in the log

Results for running the test scripts in my local machine after fix:

Delay hovers around 12~19 sec
No DR102 warning

Generated at Thu Feb 08 03:12:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.