[SERVER-4745] Figuring out which shard to send a query to takes a long time when doing large $in queries on the shard key Created: 23/Jan/12 Updated: 11/Jul/16 Resolved: 10/Feb/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 1.8.4, 2.0.2 |
| Fix Version/s: | 2.0.3, 2.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
The problem seems to be in ChunkManager::getShardsForQuery in chunk.cpp. Seems to do n^2 comparisons to figure out where to send the queries. |
| Comments |
| Comment by auto [ 10/Feb/12 ] | |
|
Author: {u'login': u'stbrody', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}Message: Add unit test for | |
| Comment by auto [ 10/Feb/12 ] | |
|
Author: {u'login': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}Message: Calculating which shard(s) to send $in queries to was taking a long time. This fix changes mongos to stop limiting the shards to send to after the first $in clause - possibly sending the query to more shards than necessary, but saving time. | |
| Comment by auto [ 09/Feb/12 ] | |
|
Author: {u'login': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}Message: Calculating which shard(s) to send $in queries to was taking a long time. This fix changes mongos to stop limiting the shards to send to after the first $in clause - possibly sending the query to more shards than necessary, but saving time. | |
| Comment by Remon van Vliet [ 30/Jan/12 ] | |
|
Eliot pointed me to those issue; we're getting "combinatorial limit of $in partitioning of result set exceeded" errors with queries on large sharded collections with a simple (one field) shard key but multiple $in clauses. I do not see a reason for the error I'm getting given my configuration but I'll keep an eye on this regardless. Full reference here : http://groups.google.com/group/mongodb-user/browse_thread/thread/d86f1246c5ed4816/73c522eee341ef98#73c522eee341ef98 | |
| Comment by Ben [ 24/Jan/12 ] | |
|
Makes sense, the level of effort should probably be reasonable given the number of shards as well? In our case, with only four shards, we expect these large $in queries to hit all of them. | |
| Comment by Spencer Brody (Inactive) [ 24/Jan/12 ] | |
|
Proposed solution: if there are multiple $in clauses in components of the shard key, only check all shard key combinations up to (and including) the first field with a $in. So in the above example, you would check all values of "a" in "aList", but you wouldn't take the cartesian product of values in "aList" with the values in "bList". This may result in sending the query to more shards than is necessary, but that will be worth it to avoid calculating the cartesian product of all values in the two (or more) $in lists. | |
| Comment by Spencer Brody (Inactive) [ 24/Jan/12 ] | |
|
Seems to be a problem specifically when the shard key is compound and you're doing a query with a $in on multiple parts of the shard key. For example, if the shard key is {a:1,b:1}, and you do a query like
|