[SERVER-41258] Router config remains stale for secondary readPreferences after moveChunk Created: 21/May/19 Updated: 27/Oct/23 Resolved: 28/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.6.8, 4.0.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jon Hyman | Assignee: | Randolph Tan |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Setup (cluster and data):
Setup (sharding and config): Note that setting a low orphanCleanupDelaySecs is very helpful in ensuring that the default of 15 minutes doesn't delay back to back migrations.
Testing: Note the python script ( SERVER41258.py
+
(repeat as necessary) Results: Python script shows count drops alongside range deleter activity and does not recover...
...unless flushRouterConfig is run on the mongos that the python script is connected to (port 27017 in this test)
Amending the python script to use readPreference primary prevents the issue. The issue also does not occur if both the python script and the moveChunk are run against the same router. original reproduction1. Create a 2 replica-set sharded cluster db.foo.find({"country" : "US"}).count() should return 200,000 documents. 3. Run this Python script to print out the number of results against a secondary query
Here's the timeline:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2019-06-17, Sharding 2019-07-01 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
| Comments |
| Comment by Randolph Tan [ 03/Jan/23 ] | ||||
|
Note that with | ||||
| Comment by Scott Glajch [ 19/Dec/22 ] | ||||
|
How can this be functions as designed? The default read concern fails to find all data when read? We are experiencing this issue as well in both mongo 4.2 and 4.4. It turns out we need to do a flushRouterConfig at the database level (doing it at the collection level doesn't fix the issue). What's worse, is that the documentation around this for 4.2 claimed that only "clearing jumbo chunks" or running "movePrimary" are the use cases for needing to manually flush the configs, so moveChunk isn't mentioned at all.
| ||||
| Comment by Randolph Tan [ 12/Jun/19 ] | ||||
|
I believe you are hitting this because the count you are sending probably is set to have the default read concern of "available". I have also attached a javascript snippet demonstrating this issue. If you set the readConcern to something other than available, then you should start getting the expected results. P.S. Also note the count command doesn't filter out "orphaned" documents unless it has a query predicate and mongod version >= 4.0. | ||||
| Comment by Eric Sedor [ 04/Jun/19 ] | ||||
|
jonhyman, I'm going to pass this on to an appropriate team to investigate further. Thanks for your patience so far. | ||||
| Comment by Eric Sedor [ 04/Jun/19 ] | ||||
|
I believe we can rule out
| ||||
| Comment by Eric Sedor [ 29/May/19 ] | ||||
|
jonhyman we can confirm that we can reproduce the drop in the count result you are reporting, but are still investigating the reason. Thanks for your patience. |