[SERVER-23625] Some read-only operations (eg count,aggregate) hang indefinitely if the primary for the shard is unreachable from mongos Created: 08/Apr/16 Updated: 06/Feb/19 Resolved: 06/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Blake Oler |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Sharding 2019-02-25 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
Even if you specify a 'secondary' read preference, we still try to call setShardVersion on the primary when running count, agg, m/r, etc. If the replica set monitor has already detected that the primary is unreachable then we skip the setShardVersion call and it works. If we have not yet detected that the node we knew to once be primary has since become unreachable, we'll try to send setShardVersion to it, and that will hang forever. |
| Comments |
| Comment by Blake Oler [ 04/Feb/19 ] |
|
This has been fixed on the current master branch. Jason Carey's interruptibilty patch in Both of these changes are only in the current working branch, meaning that maxTimeMS support is incomplete on previous releases. Do we seek to backport behavior to previous releases as part of this ticket kaloian.manassiev? |
| Comment by Spencer Brody (Inactive) [ 08/Apr/16 ] |
|
Attaching jstest that reproduces the problem(s) |