[SERVER-11364] 1gb sharded collection has only 3 chunks and is not migrating data Created: 24/Oct/13  Updated: 11/Jul/16  Resolved: 04/Nov/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jeffrey Berger Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centOS


Attachments: Text File mongod_11364.log     Text File mongos_11364.log    
Operating System: Linux
Steps To Reproduce:

Create a new database and enable sharding on it. Then create a collection that is sharded on a hash of the _id field. Then insert a bunch of documents and the chunks stay at 3 even if the collection grows to be a lot more than the maximum size of three chunks.

Participants:

 Description   

I created a 3 shard cluster and began to put data on the cluster and found that data was not be distributed across all three shards. It is either distributed on the first two (profilea and profileb) or on the third (profilec).

Here is the sh.status() output

— Sharding Status —

  sharding version: {
	"_id" : 1,
	"version" : 3,
	"minCompatibleVersion" : 3,
	"currentVersion" : 4,
	"clusterId" : ObjectId("521fe31cf505ad049bb9d3c7")
}
  shards:
	{  "_id" : "profilea",  "host" : "profilea/ec2ev-qaprof1a1.sailthru.pvt:27017" }
	{  "_id" : "profileb",  "host" : "profileb/ec2ev-qaprof1b1.sailthru.pvt:27017" }
	{  "_id" : "profilec",  "host" : "profilec/ec2ev-qaprof1c1.sailthru.pvt:27017,ec2ev-qaprof1c2.sailthru.pvt:27017" }
  databases:
	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
	{  "_id" : "profile",  "partitioned" : true,  "primary" : "profilea" }
		profile.profile
			shard key: { "_id" : "hashed" }
			chunks:
				profilea	7
				profileb	7
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-7707918766331510156") } on : profilea { "t" : 5, "i" : 0 }
			{ "_id" : NumberLong("-7707918766331510156") } -->> { "_id" : NumberLong("-6148914691236517204") } on : profileb { "t" : 6, "i" : 0 }
			{ "_id" : NumberLong("-6148914691236517204") } -->> { "_id" : NumberLong("-4639504447783653088") } on : profilea { "t" : 7, "i" : 0 }
			{ "_id" : NumberLong("-4639504447783653088") } -->> { "_id" : NumberLong("-3074457345618258602") } on : profileb { "t" : 8, "i" : 0 }
			{ "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong("-1615570839856503986") } on : profileb { "t" : 4, "i" : 0 }
			{ "_id" : NumberLong("-1615570839856503986") } -->> { "_id" : NumberLong("-877675254450465414") } on : profilea { "t" : 4, "i" : 1 }
			{ "_id" : NumberLong("-877675254450465414") } -->> { "_id" : NumberLong(0) } on : profilea { "t" : 3, "i" : 23 }
			{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1492655684061789576") } on : profilea { "t" : 3, "i" : 8 }
			{ "_id" : NumberLong("1492655684061789576") } -->> { "_id" : NumberLong("2247431082742756242") } on : profilea { "t" : 3, "i" : 20 }
			{ "_id" : NumberLong("2247431082742756242") } -->> { "_id" : NumberLong("3074457345618258602") } on : profilea { "t" : 3, "i" : 21 }
			{ "_id" : NumberLong("3074457345618258602") } -->> { "_id" : NumberLong("4524717206742292819") } on : profileb { "t" : 3, "i" : 14 }
			{ "_id" : NumberLong("4524717206742292819") } -->> { "_id" : NumberLong("6148914691236517204") } on : profileb { "t" : 3, "i" : 15 }
			{ "_id" : NumberLong("6148914691236517204") } -->> { "_id" : NumberLong("7597038405924785714") } on : profileb { "t" : 3, "i" : 18 }
			{ "_id" : NumberLong("7597038405924785714") } -->> { "_id" : { "$maxKey" : 1 } } on : profileb { "t" : 3, "i" : 19 }
	{  "_id" : "stats",  "partitioned" : true,  "primary" : "profilea" }
		stats.stats
			shard key: { "_id" : "hashed" }
			chunks:
				profilea	3
				profileb	3
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-6148914691236517204") } on : profilea { "t" : 4, "i" : 0 }
			{ "_id" : NumberLong("-6148914691236517204") } -->> { "_id" : NumberLong("-3074457345618258602") } on : profileb { "t" : 5, "i" : 0 }
			{ "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong(0) } on : profilea { "t" : 3, "i" : 4 }
			{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("3074457345618258602") } on : profilea { "t" : 3, "i" : 5 }
			{ "_id" : NumberLong("3074457345618258602") } -->> { "_id" : NumberLong("6148914691236517204") } on : profileb { "t" : 3, "i" : 6 }
			{ "_id" : NumberLong("6148914691236517204") } -->> { "_id" : { "$maxKey" : 1 } } on : profileb { "t" : 3, "i" : 7 }
		stats.stats.pv.day
			shard key: { "_id" : "hashed" }
			chunks:
				profilea	3
				profileb	3
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-6148914691236517204") } on : profilea { "t" : 4, "i" : 0 }
			{ "_id" : NumberLong("-6148914691236517204") } -->> { "_id" : NumberLong("-3074457345618258602") } on : profileb { "t" : 5, "i" : 0 }
			{ "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong(0) } on : profilea { "t" : 3, "i" : 4 }
			{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("3074457345618258602") } on : profilea { "t" : 3, "i" : 5 }
			{ "_id" : NumberLong("3074457345618258602") } -->> { "_id" : NumberLong("6148914691236517204") } on : profileb { "t" : 3, "i" : 6 }
			{ "_id" : NumberLong("6148914691236517204") } -->> { "_id" : { "$maxKey" : 1 } } on : profileb { "t" : 3, "i" : 7 }
		stats.stats1.blast
			shard key: { "_id" : "hashed" }
			chunks:
				profilea	42
				profileb	41
			too many chunks to print, use verbose if you want to force print
	{  "_id" : "localhost",  "partitioned" : false,  "primary" : "profilea" }
	{  "_id" : "sailthru",  "partitioned" : false,  "primary" : "profileb" }
	{  "_id" : "test",  "partitioned" : false,  "primary" : "profileb" }
	{  "_id" : "testA",  "partitioned" : true,  "primary" : "profilec" }
		testA.dump
			shard key: { "_id" : "hashed" }
			chunks:
				profilec	3
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-3074457345618258602") } on : profilec { "t" : 1, "i" : 0 }
			{ "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong("3074457345618258602") } on : profilec { "t" : 1, "i" : 1 }
			{ "_id" : NumberLong("3074457345618258602") } -->> { "_id" : { "$maxKey" : 1 } } on : profilec { "t" : 1, "i" : 2 }
	{  "_id" : "testB",  "partitioned" : true,  "primary" : "profilec" }
		testB.shardTest
			shard key: { "_id" : "hashed" }
			chunks:
				profilec	3
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-3074457345618258602") } on : profilec { "t" : 1, "i" : 0 }
			{ "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong("3074457345618258602") } on : profilec { "t" : 1, "i" : 1 }
			{ "_id" : NumberLong("3074457345618258602") } -->> { "_id" : { "$maxKey" : 1 } } on : profilec { "t" : 1, "i" : 2 }

To test this I created two test collections and inserted documents in order to force mongos to create chunks and move them across the the shards. The shard key was a hash of the _id. The collection testB.shardTest has stats of the following

{
	"sharded" : true,
	"ns" : "testB.shardTest",
	"count" : 5000000,
	"numExtents" : 18,
	"size" : 1040000000,
	"storageSize" : 1164914688,
	"totalIndexSize" : 402529008,
	"indexSizes" : {
		"_id_" : 162228192,
		"_id_hashed" : 240300816
	},
	"avgObjSize" : 208,
	"nindexes" : 2,
	"nchunks" : 3,
	"shards" : {
		"profilec" : {
			"ns" : "testB.shardTest",
			"count" : 5000000,
			"size" : 1040000000,
			"avgObjSize" : 208,
			"storageSize" : 1164914688,
			"numExtents" : 18,
			"nindexes" : 2,
			"lastExtentSize" : 307515392,
			"paddingFactor" : 1,
			"systemFlags" : 1,
			"userFlags" : 0,
			"totalIndexSize" : 402529008,
			"indexSizes" : {
				"_id_" : 162228192,
				"_id_hashed" : 240300816
			},
			"ok" : 1
		}
	},
	"ok" : 1
}
{code]
I believe the issue is somehow related to mongo not splitting this into more than three chunks, which I don't understand how it is possible when the maximum size of the chunk is 64 megs.  This setting has not been changed.  In the config db I have the following entry

mongos> db.settings.findOne()

{ "_id" : "chunksize", "value" : 64 }

 



 Comments   
Comment by Amalia Hawkins [ 04/Nov/13 ]

Great! I'm glad to hear things are working for you.

Comment by Jeffrey Berger [ 04/Nov/13 ]

Yes as soon as I installed ntp and began running ntpd on all the machines in the cluster it began rebalancing and now everything is evenly distributed.

Comment by Amalia Hawkins [ 04/Nov/13 ]

Did synching up the time change or fix the behavior?

Comment by Jeffrey Berger [ 25/Oct/13 ]

All the nodes have slightly different times, the biggest difference seems to be about 90 seconds. We've informed the admins of the machines and will synch the time up next week. Thanks a bunch for all your help with this.

Comment by Eliot Horowitz (Inactive) [ 25/Oct/13 ]

No downside, I would do that.
Have you checked all nodes, is that the highest?

Comment by Jeffrey Berger [ 25/Oct/13 ]

There is a shift of 50 seconds between two of the shards, ntpd is not running. Would there be any downside to immediately syching their times?

Comment by Eliot Horowitz (Inactive) [ 25/Oct/13 ]

Ah, great, the clock skew log message is probably the root cause.

Can you check the clocks on all of the machines (mongos and mongod) and see if there is one (or many) apart?

Are you running ntpd?

Comment by Jeffrey Berger [ 25/Oct/13 ]

As of submitting the bug we had one mongos instance, there are now three different ones.

The data is being inserted in the shell with the following code :
for(var i=0;i<100000;i++){db.testSharded.insert(

{"num" : "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", "numm" : "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB", "dumm" : "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC" })}

(The junk in there is just to make the document bigger to fill up space faster)

I can in fact attach such logs. I've included the mongos log and the mongod log from the primary on profilec. If we need any other logs from any of the other instances let me know and I'll pull them. I have noticed something in the mongos log

[Balancer] caught exception while doing balance: error checking clock skew of cluster ec2ev-qaprofconf1.sailthru.pvt:27019,ec2ev-qaprofconf2.sailthru.pvt:27019,ec2ev-qaprofconf3.sailthru.pvt:27019 :: caused by :: 13650 clock skew of the cluster ec2ev-qaprofconf1.sailthru.pvt:27019,ec2ev-qaprofconf2.sailthru.pvt:27019,ec2ev-qaprofconf3.sailthru.pvt:27019 is too far out of bounds to allow distributed locking.

This has persisted from when we have dropped the shard from the cluster and then re-added it, there was no effect. If this is the root cause of us not being able to split and balance across the cluster we have not been able to recover from this error. What are the causes of this and how would we bring the cluster back to operational?

Thanks for all the help.

Comment by Eliot Horowitz (Inactive) [ 25/Oct/13 ]

You are definitely right that this is related to splitting.
Some questions:

  • How many mongos do you have?
  • How are you inserting the data?
  • Can you attach the mongos and mongod log?
Generated at Thu Feb 08 03:25:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.