[SERVER-64948] Simplify get() interface on Futures Created: 25/Mar/22  Updated: 06/Dec/22

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Matt Diener (Inactive) Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Participants:

 Description   

In working on SERVER-62560, we've determined that the rules around when our futures are and are not `valid()` can be difficult to reason about. After many long conversations internal to the Service Architecture team, it's pretty clear that Futures are confusing. This is also supported by tasks like SERVER-53538, and frequent questions that get posed about Futures. In order to scale to a larger team, a case can be made to narrow the valid usage patterns of Futures and eliminate nuance.

The goal would be to take on all of the challenges below. The ideal end result would be a Future implementation with a single get() that consumes the Future 100% of the time.

Challenge 1:

`folly::Future` and `std::future` both offer a much more limited getters on futures. Both of those libraries specify `valid() == false` as a postcondition on `get()`. It seems the spirit of `get()` from the perspective of those libraries is to consume the future.

Our implementation of Future is currently different from those mentioned above insofar as we allow for get() and wait to be cancelled via interruptible. Cancellations currently allow for a future to be used as if `get()` was never called if that `get()` operation is cancelled. This is specifically described to be an intentional use case in `future_impl.h`, which describes `kWaitingOrHaveChildren -> kHaveCallback` to be a valid state transition. Cancellable `get()` makes sense for situations like shutdown, but usage of the Future after get means we cannot enforce a simple contract like folly and std.

Can we ask people to use a `wait()` then `get()` pattern if they want to set up continuations after cancellation?

The consequence of this would be us moving from the Future's shared state pointer at the beginning of get() instead of moving from it at the time the value becomes available. `get()` will reliably have a postcondition of `valid() == false`. This creates 2 more challenges detailed below.

 

Challenge 2:

We offer 2 implementations of get() that do not invalidate the Future. lvalue ref get() and const ref get() both keep the future valid. They simply peek into the shared state and provide access to the value stored within. This is something that can be rationalized, but this will inevitably lead to confusion. We don't want everyone at Mongo to have to understand the implementation details of Futures in order to use them effectively. Can we move to a single implementation of get() that consumes the Future? Why do we offer these peeking gets when other implementations of the concept don't feel the need?

 

The fix can go along the lines of moving from:

auto val = fut.get();
return fut;

to:

auto val = fut.get();
return Future::makeReady(val);

 

For ExecutorFutures, we might need to add something to the interface that allows us to get the executor out of the initial future so it can be forwarded to the new one. I.e.: 

auto val = execFut.get();
return Future::makeReady(val).thenRunOn(execFut.getExecutor());
// or return ExecutorFuture(execFut.getExecutor(), val);

 

Challenge 3:

We are pretty confident that existing code would break the rules that would be changed by addressing Challenge 1. We need to implement some sort of sanitizer that detects code that uses future-y types after get. 

It sounds like clang-tidy requires a recompile to add extra logic. Are there alternatives available that let us use the AST to flag errors? Regardless of whether such tooling exists, we would want to at least do one pass to make sure the code is in a good state when we push out this new contract.

Acceptance Criteria:

  • There is a single get() implementation that guarantees valid() == false as a postcondition.
  • We have some mechanism/tooling available for detecting instances of incorrect usage of Futures after they are invalid.
  • We eliminate all instances of incorrect usage of Futures after valid() == false.

Generated at Thu Feb 08 06:01:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.