[SERVER-60953] TenantMigrationDonorService::getDurableState should return a future Created: 22/Oct/21  Updated: 29/Oct/23  Resolved: 15/Nov/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Matt Broadstone Assignee: Didier Nadeau
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-61785 API for detecting conflicts among Ten... Closed
Backwards Compatibility: Fully Compatible
Sprint: Server Serverless 2021-11-15, Server Serverless 2021-11-29
Participants:

 Description   

This method has a side-effect of waiting for the fulfillment of a promise indicating the state document has been inserted and majority-committed. This pollutes the TMDS API by requiring an opCtx be passed in rather than letting the caller decide where to block on the future. We should consider returning a Future<DurableState> instead.



 Comments   
Comment by Githook User [ 15/Nov/21 ]

Author:

{'name': 'Didier Nadeau', 'email': 'didier.nadeau@mongodb.com', 'username': 'nadeaudi'}

Message: SERVER-60953 Return an optional when getDurableState is called
Branch: master
https://github.com/mongodb/mongo/commit/7d7ffa22ea8d3979d7a5c1623c8a1683424393c4

Comment by Esha Maharishi (Inactive) [ 05/Nov/21 ]

didier.nadeau, that sounds good to me. My understanding is the assert is "tenant id's are different, or durable state exists and migration is aborted and garbage collectable." (I think your comment here is missing the "durable state exists", but I saw you added this in the PR.)

Comment by Didier Nadeau [ 05/Nov/21 ]

Following some reflection (and yesterday's discussion with matt.broadstone ) I believe the idea of an optional seems better :

std::optional<DurableState> getDurableState();
// Return {} before a durable state exists 

The instance conflict logic does the following :

...
auto durableState = typedInstance->getDurableState(opCtx);
uassert(ErrorCodes::ConflictingOperationInProgress,
                str::stream() << "tenant " << tenantId << " is already migrating",
                typedInstance->getTenantId() != tenantId || (durableState.state == TenantMigrationDonorStateEnum::kAborted &&                     durableState.expireAt));
..

This will fail if :

  • The tenantId are the same AND
  • The state is not aborted OR expireAt does not exist

In the case of a tenant migration that recently started (before a durable state exists), this would block until we get the first durable state (most likely kUninitialized or kAbortingIndexBuilds), and then it would fail (as the state is not kAborted). Presumably the caller would then try again later once the state is aborted.

It seems we can simply cut the wait and fail right away if a durableState does not exist because, in that scenario, the instance is in kUninitialized state. In that case too the caller can try again later. Using a boost::optional seems logical as, at the beginning, it will simply be empty which allow us to fail right away.

I also added a getInitialFuture that will be fulfilled at the same time that durableState gets its first value, allowing users to wait on the first durable state if needed.

matt.broadstone esha.maharishi  let me know if you think that wouldn't work !

Comment by Matt Broadstone [ 01/Nov/21 ]

After discussion we decided that we should move the _initialDonorStateDurablePromise waiting outside of the getDurableState method, and instead make a top-level future that callers can wait on. We'll need to make sure that we explicitly wait for the initial donor state in both the donorStartMigration and donorAbortMigration commands (at least).

Comment by Esha Maharishi (Inactive) [ 26/Oct/21 ]

SERVER-60752 made PrimaryOnlyService::constructInstance take an 'existingInstances' parameter so that each service can use custom logic to check for conflicting instances before constructing the new instance.

SERVER-60752 also made constructInstance take an OperationContext parameter, because part of TenantMigrationDonorService's logic for checking for conflict involved calling TenantMigrationDonorService::Instance::getDurableState, which takes an OperationContext because it waits on a promise.

However, constructInstance should not wait on any promises, since it's called under the PrimaryOnlyService's mutex.

This ticket could fix this by adding two variants of getDurableState: a{{ Future<DurableState> getDurableStateFuture}} and a boost::optional<DurableState> getDurableState. constructInstance could use the second and throw ConflictingOperationInProgress if it returns boost::none.

Generated at Thu Feb 08 05:51:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.