[SERVER-71764] Implement cancellation contract for async_rpc operations Created: 01/Dec/22  Updated: 29/Oct/23  Resolved: 31/Jan/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.0-rc0

Type: Improvement Priority: Major - P3
Reporter: George Wangensteen Assignee: Amirsaman Memaripour
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-71194 Make async_rpc::sendHedgeCommand kill... Closed
Related
is related to SERVER-73016 Cancellation semantics for async_rpc ... Closed
is related to SERVER-74466 Attach OperationKey in async_rpc for ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Service Arch 2022-12-26, Service Arch 2022-12-12, Service Arch 2023-01-09, Service Arch 2023-01-23, Service Arch 2023-02-06
Participants:
Linked BF Score: 20

 Description   

We now have a defined cancellation contract that the async_rpc API should implement: 

The RPC library should append an OperationKey (GUID) to all operations it sends. When those operations are canceled, if networking has begun for that operation (i.e. any data may have been sent), the RPC unconditionally sends fire-and-forget _killOperations for that OperationKey to the same remote node. 

Since the library currently delegates to TaskExecutor/NetworkInterfaceTL to perform networking, implementing this cancellation contract will first involve cancelling any ongoing network interface operations, if they have begun. This is possible by simply using the cancellation token passed to the network interface, and should happen automatically if the async_rpc API's user cancels the token they passed in. Then, the async_rpc API should inspect the NetworkInterface operation response and see if it succeeded or successfully cancelled; if it succeeded, the async_rpc layer needs to send killOperations itself.

If the NetworkInterface operation was cancelled successfully, the network interface may or may not have sent the required _killOperations. Here, we should do a short/timeboxed investigation on the best solution - one thing we could do is fix the network interface to always send _killOperations in this case, if that fix is small and simple. If it is not, we could have the async_rpc api always send it, even if it ends up being a duplicate _killOperations. 



 Comments   
Comment by Githook User [ 31/Jan/23 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-71764 Fix cancellation of hedged operations
Branch: master
https://github.com/mongodb/mongo/commit/911f78a88f7cc571b09ba0c67307bbf6d230a3b2

Comment by Githook User [ 24/Jan/23 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: Revert "SERVER-71764 Fix cancellation of hedged operations"

This reverts commit 797beaa1ab13144548b94dc4b90d75ec05626e33.
Branch: master
https://github.com/mongodb/mongo/commit/11b02687c7cc1afb0b7e91f8bd55323678db580e

Comment by Githook User [ 18/Jan/23 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-71764 Fix cancellation of hedged operations
Branch: master
https://github.com/mongodb/mongo/commit/797beaa1ab13144548b94dc4b90d75ec05626e33

Generated at Thu Feb 08 06:19:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.