[SERVER-30600] mongos does not detect stale config when clients use non-primary read preferences Created: 10/Aug/17 Updated: 07/Sep/17 Resolved: 17/Aug/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Sharding |
| Affects Version/s: | 3.4.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Seth Kelly | Assignee: | Mark Agarunov |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | RF | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | Stand up and configure the following MongoDB configuration:
Create an unsharded database and populate a collection with enough test data that it would be split into multiple chunks upon sharding. |
||||||||
| Participants: | |||||||||
| Description |
|
Mongos instances which do not receive any requests with the primary read preference do not get their chunk location configuration updated after a chunk migration. This results in missing data in query results in cases where the query includes the shard key and the mongos routes the query to the wrong shard. The only workarounds I have come up with so far is to hit every mongos instance with a dummy primary read pref query for each sharded collection (or maybe call the refresh command against the mongos) at some regular interval. Background info: We've hit VM RAM capacity issues, and are now attempting to shard in-place into 3 shards, with a mongos instance co-located with each app instance. Everything went smoothly at first, I allowed the balancer to migrate some chunks to the new shards. After a few chunks I disabled the balancer to verify no production errors, and found that objects which had moved are no longer coming back in queries by shard key. If I make an identical query agains the mongos from the shell (which defaults to primary read preference) I see the following in the logs and get correct results:
Afterwards, my app's queries (using readPref=nearest) correctly return the same results. |
| Comments |
| Comment by Mark Agarunov [ 17/Aug/17 ] |
|
Hello skelly, As this behavior seems to be due to the same underlying issue as Thanks, |
| Comment by Andy Schwerin [ 12/Aug/17 ] |
|
I think that's your best choice today. Disable the balancer if your data naturally has an even distribution, maybe. |
| Comment by Seth Kelly [ 11/Aug/17 ] |
|
Do any of you have any recommendation for a workaround in the meantime? I was thinking I could run a separate thread that makes a simple query against each mongos using the primary read preference at some reasonable interval. Thoughts on that approach? |
| Comment by Seth Kelly [ 11/Aug/17 ] |
|
Great thanks a lot guys for the quick response. I'm hoping I can develop a reasonable workaround until the feature comes along. I expect chunk migrations will be rare in my deployment anyway, at least after the initial balancing of the data. |
| Comment by Andy Schwerin [ 11/Aug/17 ] |
|
OK. skelly, this is a duplicate of a feature request that we've been developing for the 3.6 release. It's sufficiently complicated that it cannot be back ported, I'm afraid, but should be available later this year. dianna.hohensee, can you help mark.agarunov out by selecting an appropriate ticket that is duplicated by this one, so Seth can track this work if he wants? |
| Comment by Dianna Hohensee (Inactive) [ 11/Aug/17 ] |
|
Yes, it will be resolved by our safe secondary reads project in v3.6. Secondaries do not currently (v3.4 or earlier) use routing information to filter results, and in v3.6 they will. To fully resolve his problem, he will likely need to use after cluster time reads (also a v3.6 feature) in order to ensure secondaries are not lagging behind their primaries, in case a mongos that is used to do the secondary reads has a stale shardVersion. |
| Comment by Andy Schwerin [ 11/Aug/17 ] |
|
I believe the safe secondary reads project will resolve this. dianna.hohensee? |