[SERVER-49568] Implement detection system for noticing a change to resharding state in config.collections on shards Created: 16/Jul/20  Updated: 29/Oct/23  Resolved: 02/Sep/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: PM-234-M1, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-09-07
Participants:

 Description   

We are under the constraint that we can't take locks, and accordingly can't do writes, during the catalog cache refresh.

This ticket is to figure out a way to either temporarily capture the config.collections data, or signal that we need to re-do a find later, then finally take action at this location in the shard version recovery process.

There are multiple ways this can be designed – whoever does this ticket can create something more generic, a way to signal actions that should be taken after a refresh, or it can be in-line logic that leads to a resharding util helper function that can process the state information.

No matter how this is designed, we must assert that the state and metadata changes are locally written before we exit the linked location. That is, on exit of recoverRefreshShardVersion, we should have always written any necessary state changes to config.localReshardingOperations.

We may have to make any async tasks happen in an onCommit handler. From Max in Slack:

the async task might need to be started in an onCommit() handler on the recovery unit then because it is possible for the storage transaction to abort (WUOW rollback not replication rollback)



 Comments   
Comment by Githook User [ 02/Sep/20 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-49568 Thread the CollectionType's ReshardingFields through to the CatalogCache refresh
Branch: master
https://github.com/mongodb/mongo/commit/52f77e8edb3e422d3329915933c0633a3c09786e

Comment by Blake Oler [ 05/Aug/20 ]

Randolph gave his LGTM over Slack.

Comment by Blake Oler [ 05/Aug/20 ]

max.hirschhorn janna.golden to approve this approach.

Comment by Blake Oler [ 05/Aug/20 ]

Proposed implementation:

Member data changes:

  • Add TypeCollectionReshardingFields as a boost::optional to CollectionAndChangedChunks
  • Add TypeCollectionReshardingFields as a boost::optional inside RoutingTableHistory

Process changes:

When doing a catalog cache refresh...

  1. When we retrieve the config.collections entry from the config server, pass the resharding fields into the CollAndChangedChunks struct.
  2. Pass resharding fields into the RoutingTableHistory object when constructing a new RoutingTableHistory as part of refresh.

When updating as part of forceShardFilteringMetadataRefresh

  1. Retrieve the current RoutingTableHistory from the CollectionShardingState after we force refresh here, after which we will have the resharding fields to act upon. This is the same place that Marcos acts upon shard version changing to indicate a collection should be created (line 158).
Comment by Kaloian Manassiev [ 21/Jul/20 ]

Per in-person discussion with blake.oler, there should not be any need for this work, because we actually can take locks as part of shardVersion recovery and/or advancement.

Generated at Thu Feb 08 05:20:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.