[SERVER-16715] Distribution of data with hashed shard key suddenly biased toward few shards Created: 04/Jan/15  Updated: 24/Jan/15  Resolved: 21/Jan/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.8.0-rc4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: John Morales Assignee: Siyuan Zhou
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Image 2015-01-04 at 3.57.31 PM.png     PNG File Screen Shot 2015-01-04 at 4.02.11 PM.png     File mongodb.first-configsvr.log.2015-01-03T21-22-34.gz     File mongodb.primary-recipient.log.2015-01-03T21-22-31.gz     File mongos.log.2015-01-03T21-22-44.gz     File mongos.sh.status.out    
Issue Links:
Depends
depends on SERVER-9287 Decision to split chunk should happen... Closed
Duplicate
duplicates SERVER-10430 Improve distribution of new chunks wh... Closed
Related
is related to SERVER-16969 First split in the chunk has a higher... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

I've repeated the behavior consistently (once each on mmapv1 and wiredTiger) but only on a rather complicated setup of a 10 shard cluster deployed on EC2 using MMS Automation. And unfortunately, attempts with a simpler workload generator on a locally deployed cluster on OS X have been unable to repro.

Participants:

 Description   

On 2.8.0-rc4, both mmapv1 and wiredTiger, I've observed a peculiar biasing of chunks toward to a seemingly random 1 or 2 shards out of the 10 total for new sharded collections.

The workload of the application is single-threaded and roughly as follows:
1.) programmatically creates a sharded database and a sharded collection
2.) creates the {_id:"hashed"} index and a {_id:1} index.
3.) inserts ~220k documents, each about ~2kB in size, with a string _id.
4.) repeats from step #1, flipping back and forth between two databases, but always on a new sharded collection. Meaning when the workload completes, there's two sharded databases, each with 24 sharded collections on {_id:"hashed"}, each collection containing ~220k documents.

Initially my the workload application starts out distributing chunks across all shards evenly as expected for each new sharded collection. However at some indeterminate point, when a new collection is created and sharded, it's as though 1 or 2 shards suddenly become "sinks" for a skewed majority (~80%) of all inserts. The other shards do receive some of the writes/chunks for the collection, but most are biased toward these 1 or 2 "select" shards.

After the workload completes, the balancer does eventually redistribute all chunks evenly.

I've had difficulty reproducing with a simpler setup, so I'm attaching some logs for the wiredTiger run where exactly 1 shard ("old_8") was the biased shard:

  • mongos.log.2015-01-03T21-22-44 - single mongos the workload was talking to
  • mongodb.primary-recipient.log.2015-01-03T21-22-31 - the primary of the suddenly "hot" biased shard "old_8"
  • mongodb.first-configsvr.log.2015-01-03T21-22-34 - first config server in the list of 3.

Timing of logs:

  • on Jan 3rd about 18:06 UTC the run beings
  • on Jan 3rd about 19:55 UTC the shard old_8 primary "JM-c3x-wt-12.rrd-be.54a0a5b5e4b068b9df8d7b9c.mongodbdns.com:27001" begins receiving the lion's share of writes. (opcounter screenshot from MMS attached, as well as MMS chunks chart for an example biased collection.)
  • in general, it seems like collections after "benchmarks2015010500" from both databases are when the issue starts. E.g., "benchmark2015010500", "benchmark2015010501", "benchmark2015010502", etc

Environment:

  • EC2 east, Ubuntu 14.04 c3.xlarge
  • All nodes on same VPC subnet
  • 2.8.0-rc4, 1GB oplog, journaling enabled
  • Logs attached are for wiredTiger, but have observed on both engines.


 Comments   
Comment by John Morales [ 04/Jan/15 ]

Also attaching a copy/paste from my console of the sh.status() output near the time the workload in question was running. Figuring might be useful - many messages under "Migration Results for the last 24 hours:". Also shows a snapshot of how the chunks were skewed toward "old_8" (before the balancer has since rebalanced the cluster).

Generated at Thu Feb 08 03:42:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.