[SERVER-9166] seeing message "did not propagate properly. :: caused by :: 8017 update not consistent" in logs Created: 28/Mar/13  Updated: 10/Dec/14  Resolved: 23/Apr/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.3
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Anup Katariya Assignee: Stennie Steneker (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Hi,

We are seeing below message. I wanted understand impact and criticality of these messages. They are coming every few minutes without any load on the systems.

Thu Mar 28 21:06:42 [Balancer] warning: distributed lock 'balancer/mvperf74:50000:1363913554:1804289383 did not propagate properly. :: caused by :: 8017 update not consistent  ns: config.locks query: { _id: "balancer", state: 0, ts: ObjectId('5154b0db19559a87f7c1d8aa') } update: { $set: { state: 1, who: "mvperf74:50000:1363913554:1804289383:Balancer:846930886", process: "mvperf74:50000:1363913554:1804289383", when: new Date(1364504801970), why: "doing balance round", ts: ObjectId('5154b0e1d4b88e98a95d72d9') } } gle1: { updatedExisting: true, n: 1, connectionId: 4564, waited: 6, err: null, ok: 1.0 } gle2: { updatedExisting: false, n: 0, connectionId: 4730, waited: 10, err: null, ok: 1.0 }
Thu Mar 28 21:06:42 [Balancer] lock update won, completing lock propagation for 'balancer/mvperf74:50000:1363913554:1804289383'
Thu Mar 28 21:06:42 [Balancer] distributed lock 'balancer/mvperf74:50000:1363913554:1804289383' acquired, ts : 5154b0e1d4b88e98a95d72d9

I went through the ticket but couldn't understand whether there is really issue or not.

https://jira.mongodb.org/browse/SERVER-8348

Attached are hash for configdb collections.

configsvr> db.runCommand({dbHash:1})
{
	"numCollections" : 13,
	"host" : "mvperf75:50016",
	"collections" : {
		"changelog" : "f8572b9ff2414f50cbb9233b27266397",
		"chunks" : "47b3db57605c02cdb82ff5fcbbf84753",
		"collections" : "c19b063e79c527da84c485962d5fa610",
		"databases" : "454758fd7b29edeac83c457c71f404c8",
		"lockpings" : "0bcccafbb0893622c79cbe7c1977555d",
		"locks" : "6b10896b971cf30dc37459c73f67739a",
		"mongos" : "53904657868ecb93eb8e0a912bb1a2ce",
		"settings" : "a7a2af43d07b341b832b9ba2473528af",
		"shards" : "0dc2364b0f909093248693bb59de8c99",
		"tags" : "d41d8cd98f00b204e9800998ecf8427e",
		"version" : "1b2cfce930005dbc34c253633409dcc1"
	},
	"md5" : "d3376ab144b95859475600c043a777d4",
	"ok" : 1
}

configsvr> db.runCommand({dbHash:1})
{
	"numCollections" : 13,
	"host" : "mvperf76:50017",
	"collections" : {
		"changelog" : "f8572b9ff2414f50cbb9233b27266397",
		"chunks" : "47b3db57605c02cdb82ff5fcbbf84753",
		"collections" : "c19b063e79c527da84c485962d5fa610",
		"databases" : "454758fd7b29edeac83c457c71f404c8",
		"lockpings" : "6c6844bd4a11b2333820536a25894d6f",
		"locks" : "0572d6ba6450aed19964f737cf7aef43",
		"mongos" : "3136648501624a0ef514e84b94d29df5",
		"settings" : "a7a2af43d07b341b832b9ba2473528af",
		"shards" : "0dc2364b0f909093248693bb59de8c99",
		"tags" : "d41d8cd98f00b204e9800998ecf8427e",
		"version" : "1b2cfce930005dbc34c253633409dcc1"
	},
	"md5" : "e4fb840ab991cabe2a671896b92dc1b2",
	"ok" : 1
}

configsvr> db.runCommand({dbHash:1})
{
	"numCollections" : 13,
	"host" : "mvperf74:50015",
	"collections" : {
		"changelog" : "f8572b9ff2414f50cbb9233b27266397",
		"chunks" : "47b3db57605c02cdb82ff5fcbbf84753",
		"collections" : "c19b063e79c527da84c485962d5fa610",
		"databases" : "454758fd7b29edeac83c457c71f404c8",
		"lockpings" : "9cc9ea9cb9a188072022994178f9b388",
		"locks" : "42bc51f489c5a061fd92b611666167c5",
		"mongos" : "6eef3dac1a00aea8ed528226d023e3da",
		"settings" : "a7a2af43d07b341b832b9ba2473528af",
		"shards" : "0dc2364b0f909093248693bb59de8c99",
		"tags" : "d41d8cd98f00b204e9800998ecf8427e",
		"version" : "1b2cfce930005dbc34c253633409dcc1"
	},
	"md5" : "9877df252717bb2f7646897ff9a6b3e6",
	"ok" : 1
}



 Comments   
Comment by Stennie Steneker (Inactive) [ 16/Apr/13 ]

Hi Anup,

As Dan Pasette noted, this warning relates to contention acquiring the lock but is part of normal operation.

Since you are using `ntpd` (rather than `ntpdate`, which might cause larger time adjustments) time sync should not be a problem.

Unless you have any further concerns related to this warning, we will close the issue.

Thanks,
Stephen

Comment by Daniel Pasette (Inactive) [ 12/Apr/13 ]

This means that there was contention in acquiring the lock. There is some complexity involved if many processes try to do this at once since there are three config servers and each process can succeed partially, but the distributed locking logic handles this transparently. In some cases, you will see the warning message: "8017 update not consistent" message. As you can see in the logs, the next line states that the lock propagation completed.

Comment by James Blackburn [ 08/Apr/13 ]

We see this as well on 2.2.3:

Mon Apr  8 19:45:58 [Balancer] distributed lock 'balancer/dlonapahls245.maninvestments.com:27119:1365027171:1804289383' unlocked. 
Mon Apr  8 19:47:04 [Balancer] warning: distributed lock 'balancer/dlonapahls245.maninvestments.com:27119:1365027171:1804289383 did not propagate properly. :: caused by :: 8017 update not consistent  ns: config.locks query: { _id: "balancer", state: 0, ts: ObjectId('516310a88c967f9223f8c0e9') } update: { $set: { state: 1, who: "dlonapahls245.maninvestments.com:27119:1365027171:1804289383:Balancer:846930886", process: "dlonapahls245.maninvestments.com:27119:1365027171:1804289383", when: new Date(1365446824502), why: "doing balance round", ts: ObjectId('516310a8bee6fa6f68e9e6e8') } } gle1: { updatedExisting: true, n: 1, connectionId: 724, waited: 18, err: null, ok: 1.0 } gle2: { updatedExisting: false, n: 0, connectionId: 617, waited: 0, err: null, ok: 1.0 }
Mon Apr  8 19:47:04 [Balancer] lock update won, completing lock propagation for 'balancer/dlonapahls245.maninvestments.com:27119:1365027171:1804289383'

It would be interesting to know what impact this has.

Comment by Anup Katariya [ 08/Apr/13 ]

Hi Stephen,

We use ntpd. I just checked all the machines and they all are in sync.

Thanks,
Anup

Comment by Stennie Steneker (Inactive) [ 08/Apr/13 ]

Hi Anup,

Can you describe how you are keeping the system clocks in sync (i.e. ntpd, ntpdate, ..)?

Thanks,
Stephen

Comment by Anup Katariya [ 02/Apr/13 ]

Hi Elliot,

The system clock of all machines in the cluster is in sync.

Comment by Eliot Horowitz (Inactive) [ 01/Apr/13 ]

They aren't critical as it seems that the system is correctly detecting an issue.
It is important if the system cannot do things it wants though.

First, can you verify the system clocks on all the machines in the cluster and see if they are in sync?

Generated at Thu Feb 08 03:19:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.