[SERVER-22462] Autosplitting failure caused by stale config in runCommand Created: 04/Feb/16 Updated: 06/Dec/22 Resolved: 28/Jul/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Goffert van Gool | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Duplicate | Votes: | 6 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
We are running multiple sharded mongo clusters, and recently one of our clusters started having an autosplitting issue. Our mongos processes have been logging the following messages:
These messages always appear together and seem related. Only one of our clusters is affected. The warning appears with several databases and collections, but for others autosplitting seems to remain functional. I have tried restarting each mongod and mongos process in this specific cluster, but nothing changed. I cannot find any issues with the config servers for this cluster either. We have a replicated config server setup (the 3.2 default). Any advice on how to proceed? I assume this issue is an indication that something is wrong with my config cluster. Are there any diagnostics commands available to check the config cluster health? I would prefer to not have to resync my config cluster, as that would give me downtime on my service. Could simply restarting the config servers be sufficient? I welcome any advice. |
| Comments |
| Comment by jiang chao [ 04/Nov/17 ] |
|
Hi, I got the same issue. |
| Comment by Esha Maharishi (Inactive) [ 23/Jun/17 ] |
|
Note that this issue was recently fixed on master and backported to 3.4 for the upcoming 3.4.6 (see linked issue |
| Comment by Randolph Tan [ 15/Dec/16 ] |
|
Attached repro ticket that demonstrates a similar problem. Note: the script is written not in a way that it will throw an error when the bug manifests, but inspecting the shard logs will reveal multiple instances of "splitChunk cannot find chunk [{ x: MinKey },{ x: MaxKey }) to split, the chunk boundaries may be stale". |
| Comment by Ramon Fernandez Marina [ 19/Apr/16 ] |
|
Hi anthony.pastor, sorry you're running into this and thanks for your offer to help. The issue is understood (see Randolph's response above) and does not affect correctness. We'd like to fix it in this development cycle, so feel free to watch this ticket for updates. Cheers, |
| Comment by Anthony Pastor [ 19/Apr/16 ] |
|
Hi, We've the same issue. Regards. |
| Comment by Randolph Tan [ 08/Apr/16 ] |
|
Note: this warning message appears more often in v3.2 because mongos now explicitly attach the chunk versions to the splitChunk command. |
| Comment by Randolph Tan [ 05/Feb/16 ] |
|
Hi, There is nothing wrong with the config servers. The mongos that is logging the warning is just a little stale compared to the other mongos. I also found a bug in the auto split were mongos does not try to update it's metadata when getting this stale error. For the mean time, flushRouterConfig should flush the metadata in mongos and force it to refresh so you don't need to restart the mongos. This is only a temporary band aid until the mongos becomes stale again (note that this does not affect correctness as it's only stale with respect to the chunk boundaries but not where the data should reside). Thanks! |