[SERVER-78524] async_rpc util getAllResponsesOrFirstErrorWithCancellation can leave dangling continuations Created: 28/Jun/23  Updated: 29/Oct/23  Resolved: 27/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: George Wangensteen Assignee: Alex Li
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-71819 Broadcast collMod command to all shards Closed
is depended on by SERVER-79070 Executor shutdown can cause async_rp... Closed
is depended on by SERVER-73875 Include `movePrimary` state in `rando... Closed
Assigned Teams:
Service Arch
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Service Arch 2023-07-10, Service Arch 2023-07-24, Service Arch 2023-08-07
Participants:
Linked BF Score: 112

 Description   

In getAllResponsesOrFirstErrorWithCancellation , we accept a vector of futures and a cancellation token. We return a future that is resolved once all the input-futures resolve with success, or one resolves with an error.

 

When any of the input futures resolve with an error, we use the cancellation token to cancel the work that produces results for the other futures. However, instead of waiting for this cancellation to "take", and ensuring all input futures resolve (which they should promptly after the cancellation), we short-circuit and ready the output future with a Cancelled status. This means that, when the future returned by the function is readied, there still may be code scheduled to run to produce results for the input-futures that hasn't observed the cancellation yet. If that code captures references to variables whose lifetime is guaranteed only while the result-future of getAllResponsesOrFirstErrorWithCancellation is not yet readied, it could read dead/destroyed objects.

 

The safe contract for this function should be: "once the output future is resolved, all input futures have resolved, with either cancellation/error or success", which would resolve the lifetime issue. 



 Comments   
Comment by Githook User [ 26/Jul/23 ]

Author:

{'name': 'Alex Li', 'email': 'alex.li@mongodb.com', 'username': 'lia394126'}

Message: SERVER-78524 Fix dangling continuations in async_rpc_util::getAllResponsesOrFirstErrorWithCancellation
Branch: master
https://github.com/mongodb/mongo/commit/de26bdebda54b7f4a7cb92bb0b74094a09bc537d

Generated at Thu Feb 08 06:38:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.