[SERVER-71060] Design a general solution for concurrent access problem in ShardingDDLCoordinator Created: 03/Nov/22  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Enrico Golfieri Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: 12/12
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Catalog and Routing
Participants:
Story Points: 2

 Description   

As part of SERVER-70580 a quick-fix has implemented to prevent data race access on getDatabaseVersion 

The solution is based on save as const member of the ShardingDDLCoordinator the object instances of the StateDoc that do not change among different phases or after a step-down recovery. Those member can be safely accessed as reference among different thread.

At the moment, the design is the following:

  • Getter methods return references as much as possible as optimisation
  • Any public method can be accessed by external threads. Any of those methods requires locking, copies or access to const member to prevent data race.
  • Any private/protected method is accessed only by the coordinator, ensuring serialisation.

However, given the design of the class, the problem might present itself again.

In the future we might need to access other informations turning private or protected methods as public, or simply by creating new public methods.

The engineers have to make sure that:

  1. lock + a return by copy is implemented in case the data accessed might change.
  2. Reference is retuned by getters only for const member.  

This is not expressed clearly by the current design, leading to possible data races at every small change.

Note: the solution should also be back-ported.


Generated at Thu Feb 08 06:17:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.