[SERVER-78026] Make ClusterServerParameter base class explicitly aware of OperationContext Created: 13/Jun/23  Updated: 26/Oct/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: car-qw, oldshardingemea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-78632 Exception in QuerySettingsManager loc... Closed
is related to SERVER-78027 clang-tidy check that we don't check ... Closed
Assigned Teams:
Catalog and Routing
Sprint: Security 2023-07-10, Security 2023-07-24, Security 2023-08-07, Security 2023-08-21, Security 2023-09-04, Security 2023-09-18, Security 2023-10-02, Security 2023-10-16
Participants:
Story Points: 3

 Description   

Cluster parameters need an operation context since they do writes and to take locks.
Things which touch on-disk state should always accept an OperationContext in their arguments to indicate they do possibly blocking yet interruptible work



 Comments   
Comment by Varun Ravichandran [ 17/Oct/23 ]

Since Sharding EMEA now owns cluster server parameters, I'm reassigning this over to their backlog.

I think the rationale for this ticket is certainly valid for cluster server parameters, specifically. However, startup server parameters would be a harder case to tackle since OperationContext is often not available at that the time that those parameters are initialized. Since cluster server parameters are built on top of existing server parameters, we are forced to consider the implications of passing OperationContext to all kinds of server parameters even though CSPs are the only ones that would benefit from this.

I'll defer to Sharding EMEA to make a decision on cost/benefit analysis and determine the best way forward.

cc: sergi.mateo-bellido@mongodb.com 

Comment by Max Hirschhorn [ 21/Jun/23 ]

Since these are in-memory updates and executed within an OpObserver, they should not take database locks or perform any additional writes.

The QuerySettingsManager is written to be using a Lock::ResourceMutex as an interruptible reader-writer lock. It currently relies on calling Client::getCurrent()->getOperationContext() in append(), set(), and reset(). Either we should make the contract explicit ClusterServerParameters may depend on an OperationContext to synchronize the in-memory state or we need to develop a different style for writing such ClusterServerParameters.

Comment by Varun Ravichandran [ 20/Jun/23 ]

Cluster server parameters derive from the server parameter interface, causing them to have the same append(), set(), reset(), and validate() methods. Although CSPs involve writing data to disk, this is not expected to occur when any of those methods are called. The expected flow is the following:

  1. setClusterParameter is executed for some parameter. This directly calls ServerParameter::validate for the parameter being updated, which is not expected to take locks or be a blocking operation. If run on a mongos, it then calls _configsvrSetClusterParameter, which coordinates the writes across all shards and itself. These commands all have OperationContext available so that they can be interrupted.
  2. After the write is applied on a given node, the ClusterServerParameterOpObserver fires to perform the appropriate in-memory updates. At this stage, ServerParameter::set() and/or ServerParameter::reset() is expected to be called. Since these are in-memory updates and executed within an OpObserver, they should not take database locks or perform any additional writes.

Based on the above, I don't anticipate gaining significant value from introducing OperationContext* to the server parameter API. The blocking and interruptible portion of cluster parameter propagation is contained within setClusterParameter and _configsvrSetClusterParameter, and those have the OperationContexts available to be killed. 

Generated at Thu Feb 08 06:37:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.