[SERVER-46255] establishCursors/AsyncRequestsSender can leave dangling request if interrupted Created: 19/Feb/20  Updated: 08/Jan/24  Resolved: 28/Apr/20

Status: Closed
Project: Core Server
Component/s: Networking, Sharding
Affects Version/s: None
Fix Version/s: 4.4.0-rc4, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Nicholas Zolnierz
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-46767 Provide a mapping from OperationKey t... Closed
Related
related to SERVER-65329 Ensure cursors are not leaked when he... Open
related to SERVER-46648 Cancel pending requests upon receivin... Closed
related to SERVER-46740 establishCursors() must always drain ... Closed
related to SERVER-62710 AsyncRequestsMerger won't attempt to ... Closed
related to SERVER-50308 Adjust debug log message when cleanin... Closed
related to SERVER-47261 AsyncRequestSender should populate cl... Open
is related to SERVER-48308 Avoid leaking exceptions in establish... Closed
is related to SERVER-45541 Test killing an aggregation operation... Closed
is related to SERVER-44167 Add OperationKey to OperationContext ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Query 2020-03-09, Query 2020-03-23, Query 2020-04-06, Query 2020-04-20, Query 2020-05-04
Participants:

 Description   

If the AsyncRequestsSender is interrupted after sending one or more requests, it may not be able to gracefully cancel that request. establishCursors() has some cleanup logic to mitigate this, but I'm pretty sure that if we're interrupted then this line will also throw the interrupted error, and we won't be able to clean up our cursors.



 Comments   
Comment by Githook User [ 06/May/20 ]

Author:

{'name': 'Nick Zolnierz', 'email': 'nicholas.zolnierz@mongodb.com', 'username': 'nzolnierzmdb'}

Message: SERVER-46255 Use killOperations to cleanup dangling remote requests

(cherry picked from commit 93476f545de27ee61fd69eeab23adbff7f57b932)
Branch: v4.4
https://github.com/mongodb/mongo/commit/cd42cb1a51f1e2a6c02759ad5fa1523b5b65faa9

Comment by Githook User [ 28/Apr/20 ]

Author:

{'name': 'Nick Zolnierz', 'email': 'nicholas.zolnierz@mongodb.com', 'username': 'nzolnierzmdb'}

Message: SERVER-46255 Use killOperations to cleanup dangling remote requests
Branch: master
https://github.com/mongodb/mongo/commit/93476f545de27ee61fd69eeab23adbff7f57b932

Comment by Githook User [ 28/Apr/20 ]

Author:

{'name': 'Nick Zolnierz', 'email': 'nicholas.zolnierz@mongodb.com', 'username': 'nzolnierzmdb'}

Message: Revert "SERVER-46255 Use killOperations to cleanup dangling remote requests"

This reverts commit aa2e0ee6d817951a29f2fec33d374d13d8f46802.
Branch: master
https://github.com/mongodb/mongo/commit/49159e1cf859d21c767f6b582dd6e6b2d675808d

Comment by Githook User [ 28/Apr/20 ]

Author:

{'name': 'Nick Zolnierz', 'email': 'nicholas.zolnierz@mongodb.com', 'username': 'nzolnierzmdb'}

Message: SERVER-46255 Use killOperations to cleanup dangling remote requests
Branch: master
https://github.com/mongodb/mongo/commit/aa2e0ee6d817951a29f2fec33d374d13d8f46802

Comment by Nicholas Zolnierz [ 05/Mar/20 ]

I've spent some time on this and haven't reached a fully robust solution, but some thoughts so far:

  • I'm able to generate and propagate a unique opKey for sub-operations. If the cursor establishment is interrupted, its definitely possible to use this opKey in a _killOperations command.
  • When the ARS is interrupted, there are several possible states for the remote requests:
    (1) The remote node is processing the command but has not yet established a cursor
    (2) The cursor has been established on the remote node and the response is in flight
    (3) The request hasn't reached the remote node yet

From my understanding, the opKey/killOperations fix will handle (1) but not the other 2 scenarios. In fact, from testing, the likelihood of sending a killOperations command while the remote node is still processing the original request is far less than the (2) and (3). I'm looking into a potential workaround by extending killOperations to also lookup any cursors with the same opKey.

Comment by Mira Carey [ 19/Feb/20 ]

See SERVER-44167 for more details on operation keys

Comment by Charlie Swanson [ 19/Feb/20 ]

mira.carey@mongodb.com suggests that there is a new (or coming soon?) thing called OperationKey and killOperations that can help us here.

Generated at Thu Feb 08 05:10:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.