[SERVER-54625] Maybe the root case for reading from secondary return empty even readConcern=local/majority Created: 19/Feb/21 Updated: 26/Apr/23 Resolved: 01/Mar/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.2.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Xin Wang | Assignee: | Edwin Zhou |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
As I see there are some issues reported that reading from secondary return empty even readConcern=local/majority, I think the root cause may be that the secondary does not load routeTable to their CatalogCache.
I comment same thing under https://jira.mongodb.org/browse/SERVER-54373 , but there is no reply for me. I want to know if I'm wrong?
I supply the reproduce step with js code here:
With the first round done , everything runs ok. And when we run `.find({_id: one_doc_id}).readPref("secondary")` through st.s1, we see the refresh log: s21230| 2021-02-07T23:11:35.291+0800 I SH_REFR [ConfigServerCatalogCacheLoader-0] Refresh for collection testdb.testcoll to version 1|3||602003276ef388a5caf68456 took 1 ms The result for both "find doc with secondary pref" and "find doc with secondary pref and local concern" is 1.
But when we run the second round (another collection), there is something unexpected. The result for both "find doc with secondary pref" and "find doc with secondary pref and local concern" is 0, but 1 is expected. And there is no refresh log.
I further added some logs to determine the root cause, and I see ( the second round) find doc with secondary pref: there is no shardVersion compare in CollectionShardingState::_getMetadataWithVersionCheckAt(), that is ok for reading from secondary will using readConcern=available which will not touch router info
I think the root cause is that secondary only load routeinfo to CatalogCache when the collection is marked as needRefresh, but it will not be marked in this scenario, because there are logical partitions one is st.s0 & st.rs1.secondary with correct version info, request with shardKey one_doc_id(in the js code) will always be routed to st.rs1 and retrieve the right resp.
Is my guess correct? And is there any good solution or will this scenario be fixed in the future? |
| Comments |
| Comment by Xin Wang [ 02/Mar/21 ] |
|
Ok, Thanks for your reply!
I'll follow
|
| Comment by Edwin Zhou [ 01/Mar/21 ] |
|
I apologize for missing your comment in It's plausible that the behavior is stemming from the routing table cache, but we can't say for sure that is the only reason for that behavior. That being said, we currently have a successful reproduction on Nevertheless, thank you for investigating this behavior, and we invite this level of depth in any future issues you discover. Gratefully, |