[SERVER-71573] Ability to terminate early the workflow of a DDL Coordinator Created: 23/Nov/22  Updated: 01/Dec/22  Resolved: 01/Dec/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Antonio Fuschetto Assignee: Antonio Fuschetto
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Sharding EMEA 2022-12-12
Participants:

 Description   

The Sharding DDL Coordinator should support is the ability to gracefully terminate the workflow of a coordinator without running all the subsequent phases.

The typical use case can be described as:

  • Suppose a workflow consisting of 3 phases, let say A, B and C.
  • The phase A realizes that the execution of the workflow can be interrupted with Status::OK under a specific condition.
  • Steps B and C must be skipped.

Today there are several solutions to implement this use case, however it would be useful that the infrastructure of the Sharding DDL Coordinator offers a facility to support the case in a standard way.

Ideally, it could sufficient to set a control flag (e.g, _completeWorkflow = true) to gracefully terminate the current workflow before completing the current phase.

If a good solution already exists using the current API, describe it on this ticket.



 Comments   
Comment by Antonio Fuschetto [ 01/Dec/22 ]

I had an internal discussion with tommaso.tocci@mongodb.com and jordi.serra-torrens@mongodb.com about the use cased described above, and we evaluated two possible solutions that don't require changes to the current infrastructure.

Solution 1: Return a dedicated error code

It consists of returning a specific error code (e.g., ErrorCodes::RequestAlreadyFulfilled) to successfully terminate the workflow, so that the error handler (that is, onError) handles it by simply returning Status::OK().

This solution has already been used in another context however it suffers from a potential problem: if the dedicated error code were returned by any functions used by the coordinator (e.g. as a command response), it would cause the workflow to exit successfully.

Solution 2: Fork the future chain

It consists of splitting the workflow based on a condition in two subchains, one of which is literally empty (representing the early termination). Since, based on the current public AP,I this is considered the preferred solution, it follows a snipped that can be used as a reference for the future.

ExecutorFuture<void> MyCoordinator::_runImpl(std::shared_ptr<executor::ScopedTaskExecutor> executor, const CancellationToken& token) noexcept {
    return ExecutorFuture<void>(**executor)
        .then([this, executor, token, anchor = shared_from_this()] {
            ...
            if (condition) {
                // Successfully terminate the workflow
                return ExecutorFuture<void>(**executor);
            }
 
            return restOfTheChain(executor, token);
        });
}
 
ExecutorFuture<void> MyCoordinator::restOfTheChain(std::shared_ptr<executor::ScopedTaskExecutor> executor, const CancellationToken& token) noexcept {
    return ExecutorFuture<void>(**executor)
        .then(_buildPhaseHandler(Phase::kPhase2, ...))
        .then(_buildPhaseHandler(Phase::kPhase3, ...));

Generated at Thu Feb 08 06:19:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.