[SERVER-15439] mergeChunk command fails if there isn't an index that exactly matches the shard key Created: 29/Sep/14  Updated: 06/Mar/15  Resolved: 04/Dec/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Galinkin Assignee: Randolph Tan
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File force-empty-chunks-merge-error.js    
Issue Links:
Duplicate
duplicates SERVER-13385 relax merge chunk constraints Closed
Operating System: ALL
Steps To Reproduce:

The steps to reproduce this error are:

  1. Ensure the index {a: 1, b:1}

    on some collection

  2. Shard it on {a:1}
  3. Execute sh.splitAt a few times to force some empty chunks
  4. Try to merge any two contiguous chunks. The error will occur.
  5. If the index {a:1}

    is ensured, the mergeChunk functions correctly.

I've attached an script that reproduces these steps and forces this error to occur. To execute it:

  1. Create a sharded environment using mtools (mlaunch init --sharded 1 --single --port 60000)
  2. Execute it with mongo localhost:60000 force-empty-chunks-merge-error.js
Participants:

 Description   

The mergeChunk command fails if the sharded collection does not have an index that exactly matches the shard key, but has one or more indexes that have the shard key as its prefix.

The error message is something like this:

{
	"ok" : 0,
	"errmsg" : "could not merge chunks, could not count docs in database.collection :: caused by :: 13106 nextSafe(): { $err: \"Unable to execute query: error processing query: ns=database.collection limit=1 skip=0\nTree: $and\nSort: {}\nProj: {}\n planner returned error: unable to fin...\", code: 17007 }"
}

Apparently, the query planner is unable to find a relevant index in this situation.



 Comments   
Comment by Ramon Fernandez Marina [ 06/Mar/15 ]

miketempleman, I'm requesting a v2.6 backport of SERVER-13385, we'll evaluate the risk/impact and make a decision. Feel free to watch that ticket for updates.

Note that if you created an index that matches the shard key you may be able to work around this issue, but be aware that creating such index may have a big performance impact on your deployment. If you are going to consider this option I'd strongly advise you to try it out on a testing setup first.

Comment by Mike Templeman [ 03/Mar/15 ]

Hi Ramon

Is there any chance that this fix can be back ported? I have a collection with thousand of empty chunks that I would like to merge but I am encountering the same error. While I really want to upgrade to 3.0 I also don't want to wait until 3.0 is (1) finally released and (2) 3.0 is thoroughly tested in our staging environment.

Comment by Randolph Tan [ 04/Dec/14 ]

The mergeChunk command was using $min and $max to perform the query to find out if there are any documents. The $min/$max operators need the exact index keys, and thus causing the error. SERVER-13385 now allows merging non-empty chunks, so the counting was completely removed.

Comment by Daniel Galinkin [ 04/Dec/14 ]

Ok, I will wait for the 2.8 final release to fix that definitely.
For now, the workaround that is to just have an index that matches the shard key is enough for me.

Thanks!
Daniel

Comment by Ramon Fernandez Marina [ 04/Dec/14 ]

Thanks for the reproducer danielgalinkin@gmail.com, and apologies for the late reply. I can observe the behavior you describe in version 2.6.5, but seems to have been fixed already in the 2.7 series. Now 2.7.x versions are for development only and are not recommended for production environments, but we've published a 2.8.0-rc1 release candidate for the next stable release (2.8.0) in case you want to give that one a try.

I need to investigate a bit more to see if this is something that may get backported to the 2.6 series.

Regards,
Ramón.

Generated at Thu Feb 08 03:38:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.