[SERVER-50055] [optimization] Make mongod internally retry an operation that was blocking during a tenant migration critical section if the migration aborts Created: 31/Jul/20  Updated: 06/Dec/22  Resolved: 17/Nov/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Cheahuychou Mao Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Do Votes: 0
Labels: pm-1791_milestone-H
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding
Participants:

 Comments   
Comment by Esha Maharishi (Inactive) [ 17/Nov/20 ]

We decided at the 6 week design review today not to pursue this optimization for the first release of serverless.

For non-transaction operations, the retry will be pushed up to the proxy, which should be a short round-trip (compared to being pushed back to the client).

For transactions, the retry of the whole transaction will be pushed up to the client.

This is considered acceptable since it should be rare for a migration to abort, and therefore for this optimization to be used.

Comment by Esha Maharishi (Inactive) [ 04/Aug/20 ]

We decided to put this on pause since it is an optimization and may or may not be worth the work and extra code complexity.

The pro's are:

  • The proxy does not need to handle retrying if the migration aborts
  • If the migration aborts, operations that were blocked are retried without an extra round trip back to the proxy

The con's are:

  • Both the server and proxy would have retry loops, increasing the system's complexity and test surface
  • The server may need separate retry loops for batched writes and other commands, increasing the code complexity

The optimization only helps if the migration aborts while in the critical section, which is a narrow case.

We should revisit whether the optimization is worth it when adding support for transactions, to see if it would prevent needing to abort the entire transaction and push the retry all the way back to the client (not just back to the proxy).

Generated at Thu Feb 08 05:21:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.