[SERVER-20857] dataSize command requires shard key, which fails on collections where shard key is prefix of a compound multikey index Created: 09/Oct/15 Updated: 15/Jul/19 Resolved: 22/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dai Shi | Assignee: | Andy Schwerin |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding 17 (07/15/16) | ||||||||
| Participants: | |||||||||
| Description |
|
We have a collection that is sharded on the key "s". There is a compound index on { s : 1, s2 :1 }, but no index on just { s : 1 }. Everything works fine related to sharding. However, when running the dataSize command against a mongoS, there is no way to get it to work. Passing it the shard key results in:
Passing in the compound key results in:
It seems that either mongo shouldn't allow you to shard a collection based on only the prefix of a compound index, or the dataSize command should be smart enough to use the compound index in this situation. I think the latter is preferable. |
| Comments |
| Comment by Dai Shi [ 22/Jun/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thomas, Your response makes no sense to me. This document you linked: https://docs.mongodb.com/manual/core/sharding-shard-key-indexes/ , describes exactly the situation that LEADS to this error. We have a collection where the shard key is on a singlekey index, "s". Then we create a compound index { "s" : 1, "s2" : 1 }. Then we drop the original index { "s" : 1 }. Sharding still works since the shard key is the prefix of the new compound index. It should be irrelevant if the field "s2" is an array, since all it needs is the prefix of the index, which is a single key. However in this case, dataSize stops working. Since 2.6, compound multikey indexes were allowed to be used as the shard key index as long as the shard key is the prefix of the index and that field itself is not an array: https://docs.mongodb.com/manual/core/index-multikey/#index-type-multikey "Changed in version 2.6: However, if the shard key index is a prefix of a compound index, the compound index is allowed to become a compound multikey index if one of the other keys (i.e. keys that are not part of the shard key) indexes an array. Compound multikey indexes can have an impact on performance." It makes no sense that the sharding calculation can use this compound index, yet dataSize cannot. All I am asking for is for dataSize to be smart enough to use the prefix of the compound index just like the sharding calculation. Otherwise you should actually make the db enforce this restriction, as it is a very nasty sharp edge that is not at all intuitive. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 22/Jun/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After additional investigation and consulting with our Sharding and Query teams, we have concluded that this is expected behavior. The dataSize command requires a singlekey index. Before reaching this conclusion, we considered two alternatives. The first option would be to put stronger checks in place to enforce that sharded clusters always have singlekey shard key index. Please note this approach would not change the behavior of dataSize and would not resolve your issue. The second option would be to allow dataSize to use multikey indexes. If dataSize used a multikey index, its performance would be significantly impacted. The work required to ease this constraint does not appear to be worth the benefit given the performance implications of using multikey indexes and number of users impacted by this behavior. We expect that sharded clusters are sharded using a singlekey index. If this restriction is observed dataSize behaves as expected. Therefore, we will be closing this ticket 'works as designed.' As you identified, there is a simple workaround: create a singlekey index. Thank you for your help investigating this issue. Kind regards, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 29/Jan/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you for the additional information. We have a working reproduction of this behavior: the dataSize command does not recognize a compound multikey index as a valid index. Please continue to watch this ticket for updates. Kind regards, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dai Shi [ 28/Jan/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is the output you requested. I have edited out the original shard, host, database, and collection names along with the ports. This is on production, where the
This is in staging, where the
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 28/Jan/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you please post the output of the following command with the appropriate substitution for the collection name?
Thank you, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dai Shi [ 28/Jan/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Ramon, I just checked both our production and staging clusters, all shards in both clusters have the {s:1, s2:1}index. I even further checked that all replicas have do indeed have the index. We would definitely have noticed in production if this index is missing, as it would be likely for all queries hitting a shard without the index to time out (or at least latency would drastically increase). I forgot to update here that in production I did end up creating an index on {s:1}as a workaround, though I still think that should be unnecessary. In staging I did not yet add the {s:1}index, and am still getting the original errors. Thanks for following up on this! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 27/Jan/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
dai@foursquare.com, apologies for the radio silence. I just managed to reproduce this problem with the help of a colleague, but for that I had to go into one of my shards and drop the {s:1, s2:1} index from it. This leads me to believe the index catalog in at least one of your shards is incorrect or corrupt. While I think it would be very hard to find out how that happened, it should be easy to check the health/consistency of the index catalogs in all your shards. Can you please run db.test_coll.getIndexes() on all your shards and see if any is missing this index? Depending on the size of your dataset one workaround may be to create new index on {s:1}. If you find problems in the index catalog with {s:1, s2:1} you may need to do this anyway while you repair the catalog. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dai Shi [ 10/Oct/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ramon, Your test does seem to be representative of the scenario I described. I'm not sure why I got those errors and you didn't. I can try dropping the index I added and running the dataSize command again on Monday. For now I tried doing the same on our staging cluster which is an exact copy of production (now has data that has been stale for a couple weeks), and am still getting the same errors. See below for config metadata, getIndexes() output, and dataSize command output (I've searched/replaced our db name to test_db and collection name to test_coll, everything else is the same):
This environment is on version 3.0.6. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 09/Oct/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
dai@foursquare.com, further investigation shows that However I wasn't able to reproduce the error message you describe using 3.0.5. I created a collection with documents with the following shape and created a compound index:
I then sharded the collection on {a:1}:
I then run the dataSize command:
Is this a representative scenario? If not, did I miss any details where your use case differs from the reproducer above? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dai Shi [ 09/Oct/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Ramon, I do not believe this is a duplicate of SERVER 19640. That issue seems to have been introduced in 3.1.2. The issue I'm describing is present in 3.0.5, and is not related to collection namespaces, but with the keyPattern parameter of the dataSize command. The current solution to my issue is to add an index identical to the shard key, in this case { s : 1 }, which duplicates the existing compound shard key. This is unnecessary wasted space taken up by the index. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 09/Oct/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
dai@foursquare.com, I believe the issue you describe is a duplicate of |