[SERVER-19668] Waiting time for write concern increase as time goes by Created: 30/Jul/15  Updated: 04/Sep/15  Resolved: 04/Sep/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Shankar Karuppiah Assignee: Sam Kleinman (Inactive)
Resolution: Done Votes: 0
Labels: RF
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-18908 Secondaries unable to keep up with pr... Closed
Operating System: Linux
Participants:

 Description   

Waiting time for write concern increase as time goes by usually within 24-36 hours after restart. The issue disappears when the secondary instances are restarted.

db.currentOp().inprog.filter(function(op){return op.msg === "waiting for write concern"}).map(function(op){op.ms = op.microsecs_running ? op.microsecs_running/1000 : 0; return op;}).filter(function(op){ return op.ms > 50}).map(function(op){return {ms : op.ms, col : op.ns}}).reduce(function(e, c){if(e[c.col]){e[c.col].push(c.ms)}else{e[c.col]=[c.ms]} return e}, {})
{
	"talk-visitor.$cmd" : [
		63.167,
		53.842,
		58.851,
		83.493,
		84.996,
		56.482,
		73.045,
		72.069,
		61.626,
		52.806,
		94.394,
		69.88
	],
	"talk-bprocess.$cmd" : [
		55.879,
		83.506,
		84.399,
		71.866,
		94.442,
		72.401,
		96.183,
		85.898,
		85.221,
		93.1,
		73.524,
		53.025,
		64.207,
		67.257,
		74.376,
		75.892,
		74.373,
		76.926,
		58.193,
		97.267,
		98.968,
		58.552,
		64.406,
		50.717,
		78.873,
		86.672,
		76.467,
		55.732,
		54.724,
		88.214,
		63.564,
		83.126,
		96.293,
		82.08,
		94.8,
		77.737,
		97.811,
		97.347,
		66.507,
		66.666,
		98.903,
		77.283,
		76.561,
		84.656,
		52.504,
		78.816,
		65.262,
		82.005,
		64.24,
		75.829,
		88.85
	],
	"talk-vsession.$cmd" : [
		68.412
	]
}



 Comments   
Comment by Sam Kleinman (Inactive) [ 03/Sep/15 ]

Sorry for the delay in getting back to you. I've been looking into this issue, and I think that there are a few of potential issues here:

  1. I think (a) primary effect that we're seeing here is the same as SERVER-18908. This issue, at core, of the workload on secondaries being less parallel than on primaries, and as a result secondaries lag behind the primary under certain kinds of workflows, particularly very heavy insert workloads. In these situations, if replication on all members stops making progress or is delayed simply behind the primary by the length of your wtimeout setting, the operations will timeout and report as "failed," even though they succeed on the primary and a "minority" of the set. Assuming your oplog has sufficient capacity, if you stop the operations secondaries should be able to catch up, without restarting, although restarting may have the same ultimate effect.
  2. Reviewing your replica set configuration, it looks like your set has 3 members with data, 1 arbiter, and 1 configured member that is offline. As a result "majority" is 3 nodes, which means that all writes have to propagate to all data nodes, any sort of delay in replication or networking errors will also cause write concerns to time out and fail.
  3. Any activity on the secondaries, may exacerbate the underlying problem (e.g. 1 above,) and make it easier for replication to fall behind and lead to failed write concerns.

I believe that the combination of these issues explains the effect that you're seeing. I'm going to go ahead and close this issue, and encourage you to track the progress of SERVER-18908. In the mean time you might also consider:

  1. attempting to limit the rate and size of insert operations, if/when your workload is very heavy and you're doing data loading operations.
  2. reconfigure your replica set to either: remove both the arbiter and missing member or replace the unreachable member of the set with an up to date member.
  3. upgrade, if you have not already, to 3.0.6 which is the latest stable release in the 3.0 series. There have been a large number of fixes that impact and improve WiredTiger and replication that may improve your experience in this issue.

I hope this helps, and thanks again for your patience.

Regards,
sam

Comment by Shankar Karuppiah [ 06/Aug/15 ]

Hello Sam,

Sorry for the late reply.

  1. Could you also include another copy of the rs.status() operation while you're experiencing this issue?
    I have upload rs.status() and db.currentOp to MongoDB Support File Upload. Each log folder within the server name are indicative of a single incident
    Example, mongodb-alpha-1/log1, mongodb-alpha-2/log1, mongodb-alpha-3/log1 is one incident

    replica-lag
    ├── mongodb-alpha-1 (Secondary)
    │   ├── log1
    │   │   └── op.log
    │   ├── log2
    │   │   └── op.log
    │   ├── log3
    │   │   ├── op.log
    │   │   └── rs.status.log
    │   └── log4
    │       ├── op2.log
    │       ├── op.log
    │       ├── rs2.status.log
    │       └── rs.status.log
    ├── mongodb-alpha-2 (Secondary)
    │   ├── log1
    │   │   └── op.log
    │   ├── log2
    │   │   └── op.log
    │   ├── log3
    │   │   ├── op.log
    │   │   └── rs.status.log
    │   └── log4
    │       ├── op2.log
    │       ├── op.log
    │       ├── rs2.status.log
    │       └── rs.status.log
    └── mongodb-alpha-3 (Primary)
        ├── log1
        │   ├── op.log
        │   └── rs.status.log
        ├── log2
        │   ├── op.log
        │   └── rs.status.log
        ├── log3
        │   ├── op.log
        │   └── rs.status.log
        └── log4
            ├── op.log
            └── rs.status.log
    

  2. I want to confirm that all members of this replica set are running MongoDB 3.0.5. Which storage engine are you using?
    mongodb-alpha-1, mongodb-alpha-2, mongodb-alpha-3 and mongodb-alpha-arbiter is running MongoDB 3.0.5 and mongodb-alpha-backup is running MongoDB 3.0.4
  3. Which MongoDB Driver are you using, and at what version?
    Nodejs, version 1.4.38
  4. Are you performing any operations directly on the secondary members of the set?
    Yes, we do, but as you can see from the log there was nothing going on secondary servers
  5. I'm interested in any other potentially resource intensive operations that may be running concurrently. For example, building indexes, aggregation and map-reduce jobs, open cursors with the noTimeout option, may all consume resources and interfere with write concern responses.
    Yes, there are a few aggregation queries that ran on secondary instances
  6. Are other operations slow while the write concern wait increases or are only
    As far as I know only write concern wait increases
  7. Do write concerns eventually return, or do some operations get stuck and remain even after they reach the timeout?
    The write concerns do return after the timeout, but that creates new problem where the application is prone to create duplicate data

FYI, we were experiencing same issue when we were running using 3.0.4, but I didn't report it because I thought this issue was related to following existing issues

Comment by Sam Kleinman (Inactive) [ 31/Jul/15 ]

Thanks for your quick response! I look forward to getting the additional information from the currentOp output while you're experiencing this symptom:

  1. Could you also include another copy of the rs.status() operation while you're experiencing this issue?
  2. I want to confirm that all members of this replica set are running MongoDB 3.0.5. Which storage engine are you using?
  3. Which MongoDB Driver are you using, and at what version?
  4. Are you performing any operations directly on the secondary members of the set?
  5. I'm interested in any other potentially resource intensive operations that may be running concurrently. For example, building indexes, aggregation and map-reduce jobs, open cursors with the noTimeout option, may all consume resources and interfere with write concern responses.
  6. Are other operations slow while the write concern wait increases or are only
  7. Do write concerns eventually return, or do some operations get stuck and remain even after they reach the timeout?

If your deployment uses monitoring in Cloud Manager, we may be able to look for systemic patterns longitudinally. You can also upload larger files outside of this ticket if you need or want to.

Regards,
sam

Comment by Shankar Karuppiah [ 31/Jul/15 ]

Hello Sam,

Thank you for looking into this issue.

1. What write concern are you specifying on these operations? Could you provide an example operation with a write concern?
We use

{ w: "majority", wt : 15000}

write concerns for all of our operations, so it kind of not possible to list all the commands

2. Could you describe the configuration of your replica set? The output of rs.status() and rs.config() would be useful to this end. If your deployment is sharded, including the status of sh.status() would also be helpful here.

tawk-alpha:SECONDARY> rs.status()
{
	"set" : "tawk-alpha",
	"date" : ISODate("2015-07-31T10:03:13.795Z"),
	"myState" : 2,
	"syncingTo" : "mongodb-alpha-3:27017",
	"members" : [
		{
			"_id" : 3,
			"name" : "mongodb-alpha-backup:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 171,
			"optime" : Timestamp(1438336992, 220),
			"optimeDate" : ISODate("2015-07-31T10:03:12Z"),
			"lastHeartbeat" : ISODate("2015-07-31T10:03:12.331Z"),
			"lastHeartbeatRecv" : ISODate("2015-07-31T10:03:11.897Z"),
			"pingMs" : 0,
			"syncingTo" : "mongodb-alpha-3:27017",
			"configVersion" : 15
		},
		{
			"_id" : 4,
			"name" : "mongodb-alpha-arbiter:27017",
			"health" : 1,
			"state" : 7,
			"stateStr" : "ARBITER",
			"uptime" : 9071,
			"lastHeartbeat" : ISODate("2015-07-31T10:03:12.271Z"),
			"lastHeartbeatRecv" : ISODate("2015-07-31T10:03:11.813Z"),
			"pingMs" : 0,
			"configVersion" : 15
		},
		{
			"_id" : 7,
			"name" : "mongodb-alpha-2:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 8963,
			"optime" : Timestamp(1438336993, 634),
			"optimeDate" : ISODate("2015-07-31T10:03:13Z"),
			"lastHeartbeat" : ISODate("2015-07-31T10:03:13.654Z"),
			"lastHeartbeatRecv" : ISODate("2015-07-31T10:03:13.286Z"),
			"pingMs" : 0,
			"syncingTo" : "mongodb-alpha-3:27017",
			"configVersion" : 15
		},
		{
			"_id" : 8,
			"name" : "mongodb-alpha-1:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 9071,
			"optime" : Timestamp(1438336993, 767),
			"optimeDate" : ISODate("2015-07-31T10:03:13Z"),
			"syncingTo" : "mongodb-alpha-3:27017",
			"configVersion" : 15,
			"self" : true
		},
		{
			"_id" : 9,
			"name" : "mongodb-alpha-3:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 9071,
			"optime" : Timestamp(1438336993, 631),
			"optimeDate" : ISODate("2015-07-31T10:03:13Z"),
			"lastHeartbeat" : ISODate("2015-07-31T10:03:13.651Z"),
			"lastHeartbeatRecv" : ISODate("2015-07-31T10:03:11.920Z"),
			"pingMs" : 0,
			"electionTime" : Timestamp(1438293267, 1),
			"electionDate" : ISODate("2015-07-30T21:54:27Z"),
			"configVersion" : 15
		}
	],
	"ok" : 1
}

3. Does this affect all operations or only some operations. Can you give an example of another operation that doesn't wait for write concern?
It has to all of the operations, but some are more prominent because of high frequency

4. Could you provide the full output of the currentOp operation? The additional context may be useful.
Sure, I will grab it next time it happens and update this thread

Shankar

Comment by Sam Kleinman (Inactive) [ 30/Jul/15 ]

Thanks for this report, could you provide a bit more information about your configuration and the operations that are in progress when you're seeing this behavior:

  1. What write concern are you specifying on these operations? Could you provide an example operation with a write concern?
  2. Could you describe the configuration of your replica set? The output of rs.status() and rs.config() would be useful to this end. If your deployment is sharded, including the status of sh.status() would also be helpful here.
  3. Does this affect all operations or only some operations. Can you give an example of another operation that doesn't wait for write concern?
  4. Could you provide the full output of the currentOp operation? The additional context may be useful.

Thanks for the information, and sorry that you've run into this issue.

Regards,
sam

Generated at Thu Feb 08 03:51:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.