[SERVER-34204] Tailable cursor fails on getMore against a sharded cluster Created: 30/Mar/18 Updated: 29/Oct/23 Resolved: 30/Apr/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.8, 4.0.0-rc0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Jeffrey Yemin | Assignee: | Charlie Swanson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2018-04-23, Query 2018-05-07 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
A getMore command on tailable cursor fails against a sharded cluster with error code 50737 and the error message
The sequence of commands and responses were:
In some cases it fails on the second getMore, in some cases the third, but otherwise the behavior appears consistent. The collection is capped with otherwise default values. In particular, the collection itself is not sharded (not even sure you can created a sharded capped collection...) You can see from the commands that the query executes in a session. Reproduced on a sharded cluster with two shards (each a single member replica set) with MongoDB Enterprise version 3.7.3-198-g8f75de8 on OS X, but it started appearing in Evergreen some time after 2/19. |
| Comments |
| Comment by Githook User [ 22/Aug/18 ] | ||||||||||||||||||||||||||||||||
|
Author: {'name': 'Charlie Swanson', 'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310'}Message: (cherry picked from commit a43fe9ae73752fbd98107cef5421341fe291ab32) | ||||||||||||||||||||||||||||||||
| Comment by David Storch [ 26/Jul/18 ] | ||||||||||||||||||||||||||||||||
|
jack.mulrow, thanks for raising this. We're going to look into whether this can be backprted to 3.6. In addition, we think it may be worthwhile to gate getMore logical session id checks behind featureCompatibilityVersion: | ||||||||||||||||||||||||||||||||
| Comment by Jack Mulrow [ 18/Jul/18 ] | ||||||||||||||||||||||||||||||||
|
I ran into this when backporting | ||||||||||||||||||||||||||||||||
| Comment by Githook User [ 01/May/18 ] | ||||||||||||||||||||||||||||||||
|
Author: {'email': 'charlie.swanson@mongodb.com', 'name': 'Charlie Swanson', 'username': 'cswanson310'}Message: | ||||||||||||||||||||||||||||||||
| Comment by Githook User [ 30/Apr/18 ] | ||||||||||||||||||||||||||||||||
|
Author: {'email': 'charlie.swanson@mongodb.com', 'username': 'cswanson310', 'name': 'Charlie Swanson'}Message: | ||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 30/Mar/18 ] | ||||||||||||||||||||||||||||||||
|
This is because we can schedule a getMore after receiving the response from a previous getMore. If the response comes in at a time between client getMores we will not have an OperationContext associated with the AsyncResultsMerger, so there will be no session id attached to any scheduled commands. I believe this is where the problematic getMore is being sent, from within the response handler from a previous getMore.
| ||||||||||||||||||||||||||||||||
| Comment by Tess Avitabile (Inactive) [ 30/Mar/18 ] | ||||||||||||||||||||||||||||||||
|
My guess is that the mongos is not always attaching the lsid for the getMores it sends to the shard. | ||||||||||||||||||||||||||||||||