[SERVER-36312] Re-enable atClusterTime selection algorithm on mongos Created: 26/Jul/18  Updated: 06/Dec/22  Resolved: 18/Nov/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Jack Mulrow Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Won't Fix Votes: 0
Labels: ShardedTxn:FutureOptimizations, max-triage, pm-564
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-37549 Compute atClusterTime for sharded tra... Closed
Gantt Dependency
has to be done before SERVER-37549 Compute atClusterTime for sharded tra... Closed
has to be done after SERVER-35707 Figure out the transaction abort stat... Closed
Related
is related to SERVER-33053 Allow unsharded connections to trigge... Backlog
is related to SERVER-37568 computeAtClusterTimeForOneShard shoul... Closed
Assigned Teams:
Sharding NYC
Participants:

 Description   

The snapshot transactions' atClusterTime selection algorithm developed as part of the Global Point in Time reads project for 4.0 works as follows:

  1. Perform routing using the latest available routing table on MongoS
  2. Using the per-shard majority committed timestamp, select the smallest timestamp across the targeted shards (this is in order to ensure none of the targeted shards will perform a no-op write)
  3. Perform routing at the timestamp selected in the previous step and if this results in the same set of shards, use the selected timestamp. Otherwise use the latest available timestamp on the logical clock

This algorithm was disabled through SERVER-34326 (and deleted by SERVER-34475) due to the small window of history that the storage engine supports. Now that SERVER-31767 has increased the window of timestamps available for atClusterTime reads, the above selection algorithm can be re-enabled.

In addition, using the majority committed timestamp goes against the performance goals of the speculative snapshot optimization and can also lead to problems on shards with enableMajorityReadConcern=false (where such shards may never provide a snapshot at the selected atClusterTime). Because of this, the algorithm should be changed to use the last applied opTime timestamps of the targeted shards.



 Comments   
Comment by Ratika Gandhi [ 18/Nov/21 ]

Closing this. We will reopen this ticket if we hear complaints about stalls from the field. 

Comment by Kaloian Manassiev [ 14/Jan/19 ]

Due to the multi-version routing table and because document histories are not carried as part of migration, the cluster time selection algorithm in question requires two phases:

  • Perform routing with the latest available cluster time in order to obtain a subset of shards
  • Select the most optimal timestamp for the produced subset of shards
  • Perform routing again with the timestamp selected in the previous step (which can potentially yield a different set of shards)

The value of this optimization is unclear versus the complexity it introduces and because of this we will not implement it.

Comment by Jack Mulrow [ 11/Oct/18 ]

If this work is started after PM-1191 has written integration tests for single replica set sharded transactions with enableReadConcernMajority=false, then it will need to be done in tandem with SERVER-37549, to avoid picking atClusterTimes based on a lagged majority commit point that shards may not be able to provide snapshots at.

Comment by Gregory McKeon (Inactive) [ 02/Oct/18 ]

When doing this, we need to make sure we're actually using the routing information from the latest cluster time, which we ended up selecting (rather than the one which was latest at the time the command ran).

Generated at Thu Feb 08 04:42:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.