[SERVER-5066] mongos splits should back off when metadata lock is taken and splitting $max or $min Created: 24/Feb/12  Updated: 10/Dec/14  Resolved: 25/Sep/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Randolph Tan
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-13402 bulk insert can result in too large c... Closed
Related
is related to SERVER-15002 Add retry when acquiring distributed ... Closed
Tested
Operating System: ALL
Participants:

 Description   

If a large migration is occurring in a collection with high insert rate, it's likely that repeated splitVector commands will be triggered by these inserts. Because the collection metadata distributed lock is taken, the split will (correctly) fail, but not before the splitVector command is run on the mongod and the config server is checked. If the chunk to be split is the $max or $min chunk, the amount of inserted data to the chunk is not changed, and every subsequent update and insert to these chunks will trigger a splitVector and splitChunk command.

One optimization for all splits would be to have the mongos itself check the distributed lock, to avoid the splitVector if the split will just fail. For $max/$min inserts, we need to either reset _dataWritten or back off in some other way.

Can be seen occurring in http://buildbot.mongodb.org/builders/Linux%2064-bit%20v8/builds/3032/steps/test_3/logs/stdio/text
when the migration in bigMapReduce.js starting here :

 m30000| Wed Feb 22 07:15:59 [conn5] moveChunk request accepted at version 10|137
 m30000| Wed Feb 22 07:15:59 [conn5] moveChunk number of documents: 0

results in ~150 splitVector requests from a single mongos.



 Comments   
Comment by Randolph Tan [ 25/Sep/14 ]

This is no longer true in master due to the changes in SERVER-13402. The _dataWritten variable is now always cleared when the thread fails to get the distributed locks.

Comment by Badgeville Inc. [ 27/Nov/12 ]

We ran into a similar issue. There was a collection whose shard ID wasn't unique as it was supposed to be, because we were hashing the key on 'nil', so the chunk got to 4G and all of the mongos agents tried to split it at the same time, killing performance. Bug was ours, in that the document didn't have the correct shard key, but just mentioning it in case you may have a similar issue.. May be something having to do with 4G.. 32 bits..

  • wedge
Generated at Thu Feb 08 03:07:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.