[SERVER-17055] secondary cannot keep up once oplog hit cap with wiredTiger Created: 26/Jan/15  Updated: 26/Jan/15  Resolved: 26/Jan/15

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: 3.0.0-rc6
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Rui Zhang (Inactive) Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-16921 WT oplog bottleneck on secondary Closed
Operating System: ALL
Participants:

 Description   

during long concurrent write, secondary cannot keep up once oplog hit cap. and eventually secondary fall into recovery mode, and stop replication

some log, before hit the oplog cap

+++++++
ts: Sun Jan 25 2015 17:54:01 GMT+0000 (UTC)
source: 172.31.35.229:27017
        syncedTo: Sun Jan 25 2015 17:54:01 GMT+0000 (UTC)
        0 secs (0 hrs) behind the primary      <=== second is keeping up
configured oplog size:   6515.767382621765MB
log length start to end: 358secs (0.1hrs)     <==== oplog
oplog first event time:  Sun Jan 25 2015 17:48:03 GMT+0000 (UTC)
oplog last event time:   Sun Jan 25 2015 17:54:01 GMT+0000 (UTC)
now:                     Sun Jan 25 2015 17:54:01 GMT+0000 (UTC)
 
...
 
+++++++
ts: Sun Jan 25 2015 18:06:14 GMT+0000 (UTC)
source: 172.31.35.229:27017
        syncedTo: Sun Jan 25 2015 18:06:13 GMT+0000 (UTC)
        1 secs (0 hrs) behind the primary       <=== secondary behind still minimal
configured oplog size:   6515.767382621765MB
log length start to end: 1091secs (0.3hrs)
oplog first event time:  Sun Jan 25 2015 17:48:03 GMT+0000 (UTC)
oplog last event time:   Sun Jan 25 2015 18:06:14 GMT+0000 (UTC)
now:                     Sun Jan 25 2015 18:06:14 GMT+0000 (UTC)
 

once oplog hit cap, secondary canon keep up, and then eventually enter recovery mode

+++++++
ts: Sun Jan 25 2015 18:06:54 GMT+0000 (UTC)
source: 172.31.35.229:27017
        syncedTo: Sun Jan 25 2015 18:06:52 GMT+0000 (UTC)
        2 secs (0 hrs) behind the primary
configured oplog size:   6515.767382621765MB
log length start to end: 1073secs (0.3hrs)
oplog first event time:  Sun Jan 25 2015 17:49:01 GMT+0000 (UTC)
oplog last event time:   Sun Jan 25 2015 18:06:54 GMT+0000 (UTC)
now:                     Sun Jan 25 2015 18:06:54 GMT+0000 (UTC)
 
+++++++
ts: Sun Jan 25 2015 18:07:04 GMT+0000 (UTC)
source: 172.31.35.229:27017
        syncedTo: Sun Jan 25 2015 18:06:54 GMT+0000 (UTC)
        10 secs (0 hrs) behind the primary   <=== secondary start fall behind
configured oplog size:   6515.767382621765MB
log length start to end: 1081secs (0.3hrs)
oplog first event time:  Sun Jan 25 2015 17:49:16 GMT+0000 (UTC)  <<== oplog hit cap, start roll over
oplog last event time:   Sun Jan 25 2015 18:07:17 GMT+0000 (UTC)
now:                     Sun Jan 25 2015 18:07:17 GMT+0000 (UTC)
 
+++++++
ts: Sun Jan 25 2015 18:07:27 GMT+0000 (UTC)
source: 172.31.35.229:27017
        syncedTo: Sun Jan 25 2015 18:06:58 GMT+0000 (UTC)
        29 secs (0.01 hrs) behind the primary
configured oplog size:   6515.767382621765MB
log length start to end: 1082secs (0.3hrs)
oplog first event time:  Sun Jan 25 2015 17:49:25 GMT+0000 (UTC)
oplog last event time:   Sun Jan 25 2015 18:07:27 GMT+0000 (UTC)
now:                     Sun Jan 25 2015 18:07:27 GMT+0000 (UTC)
 
...
 
+++++++
ts: Sun Jan 25 2015 18:17:29 GMT+0000 (UTC)
source: 172.31.35.229:27017
        syncedTo: Sun Jan 25 2015 18:08:22 GMT+0000 (UTC)
        547 secs (0.15 hrs) behind the primary  <<=== secondary behind grows fast
configured oplog size:   6515.767382621765MB
log length start to end: 1131secs (0.31hrs)
oplog first event time:  Sun Jan 25 2015 17:58:53 GMT+0000 (UTC)
oplog last event time:   Sun Jan 25 2015 18:17:44 GMT+0000 (UTC)
now:                     Sun Jan 25 2015 18:17:44 GMT+0000 (UTC)
 
+++++++
ts: Sun Jan 25 2015 18:17:54 GMT+0000 (UTC)
source: 172.31.35.229:27017
        syncedTo: Sun Jan 25 2015 18:08:24 GMT+0000 (UTC)
        570 secs (0.16 hrs) behind the primary
configured oplog size:   6515.767382621765MB
log length start to end: 1133secs (0.31hrs)
oplog first event time:  Sun Jan 25 2015 17:59:01 GMT+0000 (UTC)
oplog last event time:   Sun Jan 25 2015 18:17:54 GMT+0000 (UTC)
now:                     Sun Jan 25 2015 18:17:54 GMT+0000 (UTC)



 Comments   
Comment by Rui Zhang (Inactive) [ 26/Jan/15 ]

I made a mistake thought SERVER-16921 was fixed already.

Comment by Scott Hernandez (Inactive) [ 26/Jan/15 ]

Is this not a dup of SERVER-16921?

Generated at Thu Feb 08 03:43:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.