[SERVER-23219] DBCommandCursor doesn't route getMore operations to original server Created: 17/Mar/16 Updated: 08/Jan/24 Resolved: 29/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Shell |
| Affects Version/s: | None |
| Fix Version/s: | 3.3.12 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Jonathan Reams |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | The following patch modifies the stepdown_query.js test to use DBClientReplicaSet to demonstrate the issue for getMore and killCursor operations. It can be invoked with resmoke.py by doing
Output
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Platforms 2016-08-26, Platforms 2016-09-19 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
DBCommandCursor will route getMore and killCursor operations to the current primary of the replica set. Since a cursor that exists on the primary remains following a stepdown, the DBCommandCursor will route a getMore or killCursor operation to the wrong node. A similiar situation can arise if slaveOk is set on the replica-set connection. |
| Comments |
| Comment by Githook User [ 29/Aug/16 ] | |||||||||||||||
|
Author: {u'username': u'jbreams', u'name': u'Jonathan Reams', u'email': u'jbreams@mongodb.com'}Message: | |||||||||||||||
| Comment by J Rassi [ 21/Mar/16 ] | |||||||||||||||
|
Max, Dave and myself spoke briefly about this issue today. Our tentative assessment of the user impact is that the shell will throw an error for certain queries and may crash, when started with a replica set connection that is connected to a 3.2+ cluster (see here and here, for more information about replica set connections). Specifically:
Replica set connections have only been documented in the shell for a couple of versions ( If we do decide to move forward with a backport for this issue, I would suggest forcing read mode "legacy" for replica set connections in the shell as an interim fix. This will likely require minor changes in the mozjs integration in order to expose this information, but the diff will still be relatively small. The real fix for this issue will be more difficult, as the core problem is a flaw of the original DBCommandCursor design. I see two possible paths forward, depending on whether or not we implement SERVER-20770:
We'll decide on an approach in our next triage meeting. | |||||||||||||||
| Comment by J Rassi [ 21/Mar/16 ] | |||||||||||||||
|
Me and Max have misdiagnosed this issue. We originally thought this issue to affect DBClientReplicaSet, but the issue is actually in DBCommandCursor. DBCommandCursor does not track the host used for the original find, so it blindly issues getMore and killCursor requests using runCommand() against the underlying connection object (which is a DBClientReplicaSet, in this case), which can result in the request being routed to the wrong replica set member. I've updated the summary/description to reflect this new discovery. To clarify: this is a shell-only issue, and does not affect the server or C++ client library. All versions of the shell since 3.2 are affected. Re-assigning back to the query team for triage. | |||||||||||||||
| Comment by J Rassi [ 18/Mar/16 ] | |||||||||||||||
|
Per discussion with milkie, reassigning to the sharding team backlog for triage. Feel free to bounce this back to platforms once triaged, if appropriate. | |||||||||||||||
| Comment by Eric Milkie [ 18/Mar/16 ] | |||||||||||||||
|
Adding sharding component as this affects mongos. |