[SERVER-38222] OP_KILL_CURSORS may not actually kill all (or any) cursors if one is pinned Created: 21/Nov/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.2.21, 3.4.18, 3.6.9, 4.0.4, 4.1.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ian Boros | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Operating System: | ALL |
| Participants: |
| Description |
|
If you run an OP_KILL_CURSORS with a list of cursor Ids [a, b, c, d...] and one of the cursors is pinned, the operation will fail early, and none of the cursors which come later in the list will be killed. For example, if cursor b is pinned, then cursors c and d would not be killed (or even examined). This is because CursorManager::pinCursor may uassert or return a Status. We call killCursorGlobalIfAuthorized in a loop here attempts to kill each cursor one by one. If one of the cursors is pinned, the uassert triggers, and the entire operation fails, meaning none of the subsequent cursors get examined. I do not believe this is possible to reproduce with the shell, since it seems like we only ever create OP_KILL_CURSORS messages with one cursor ID. However, according to the drivers spec, some drivers will kill cursors in batches:
See the driver specs here. It's worth mentioning that drivers probably only kill cursors once the user has stopped reading from them, meaning none of the cursors would be pinned. In that case, they wouldn't encounter this issue. Here's a repro.
Then run this test (I called it "killcursors_bug.js"):
CC david.storch since he first imagined this being a problem |
| Comments |
| Comment by Matt Broadstone [ 06/Dec/18 ] |
|
behackett it does not |
| Comment by Bernie Hackett [ 04/Dec/18 ] |
|
matt.broadstone, does Ruby do that against MongoDB 3.2+ (where the find and killCursors commands exist)? If so, why? |
| Comment by Bernie Hackett [ 04/Dec/18 ] |
|
PyMongo may in some cases call OP_KILL_CURSORS with a list of cursors on MongoDB <= 3.0 and the killCursors command on MongoDB 3.2+. |
| Comment by Matt Broadstone [ 03/Dec/18 ] |
|
This is not a problem for node (it uses the killCursors command only), but ruby does use OP_KILL_CURSORS and attempts to kill 10k batches at a time. |
| Comment by David Storch [ 03/Dec/18 ] |
|
jeff.yemin, confirmed, this only affects OP_KILL_CURSORS. It does not affect killCursors command. |
| Comment by Jeffrey Yemin [ 03/Dec/18 ] |
|
I did some digging and looks like java, .net, and go drivers kill a cursor at a time. |
| Comment by Jeffrey Yemin [ 02/Dec/18 ] |
|
ian.boros can you confirm that this affects only OP_KILL_CURSORS and not the killCursors command? |
| Comment by Bernie Hackett [ 01/Dec/18 ] |
|
I don't know if this is common or not jeff.yemin matt.broadstone jmikola, can you check with your teams? |
| Comment by Ian Boros [ 29/Nov/18 ] |
|
No, sorry for being vague, I was just saying that I can confirm this affects all of the recently released versions of the server after a cursory inspection. I took a closer look at 3.2 and 3.4, and it looks like this bug also affects them, since the creation of a ClientCursorPin here can throw. See this and this. |
| Comment by Charlie Swanson [ 29/Nov/18 ] |
|
ian.boros does that also mean that as far as you can tell it does not impact 3.4 or earlier? |
| Comment by Ian Boros [ 29/Nov/18 ] |
|
As far as I can tell, this goes back to 3.6. |
| Comment by Craig Homa [ 29/Nov/18 ] |
|
Hey behackett, do you think this could potentially be a big issue for any drivers? Is it common driver behavior to send an OP_KILL_CURSORS with multiple cursor id's? Also, ian.boros please check on which server versions are affected by this. |