[SERVER-16715] Distribution of data with hashed shard key suddenly biased toward few shards Created: 04/Jan/15 Updated: 24/Jan/15 Resolved: 21/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.8.0-rc4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | John Morales | Assignee: | Siyuan Zhou |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Steps To Reproduce: | I've repeated the behavior consistently (once each on mmapv1 and wiredTiger) but only on a rather complicated setup of a 10 shard cluster deployed on EC2 using MMS Automation. And unfortunately, attempts with a simpler workload generator on a locally deployed cluster on OS X have been unable to repro. |
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
On 2.8.0-rc4, both mmapv1 and wiredTiger, I've observed a peculiar biasing of chunks toward to a seemingly random 1 or 2 shards out of the 10 total for new sharded collections. The workload of the application is single-threaded and roughly as follows: Initially my the workload application starts out distributing chunks across all shards evenly as expected for each new sharded collection. However at some indeterminate point, when a new collection is created and sharded, it's as though 1 or 2 shards suddenly become "sinks" for a skewed majority (~80%) of all inserts. The other shards do receive some of the writes/chunks for the collection, but most are biased toward these 1 or 2 "select" shards. After the workload completes, the balancer does eventually redistribute all chunks evenly. I've had difficulty reproducing with a simpler setup, so I'm attaching some logs for the wiredTiger run where exactly 1 shard ("old_8") was the biased shard:
Timing of logs:
Environment:
|
| Comments |
| Comment by John Morales [ 04/Jan/15 ] |
|
Also attaching a copy/paste from my console of the sh.status() output near the time the workload in question was running. Figuring might be useful - many messages under "Migration Results for the last 24 hours:". Also shows a snapshot of how the chunks were skewed toward "old_8" (before the balancer has since rebalanced the cluster). |