[SERVER-53935] Noop write for afterClusterTime/atClusterTime on config servers could override opCtx's readConcernArgs Created: 20/Jan/21  Updated: 29/Jan/21  Resolved: 29/Jan/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Jordi Serra Torrens
Resolution: Won't Do Votes: 0
Labels: Sharding-EMEA, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-38855 Noop write for afterClusterTime/atClu... Closed
Operating System: ALL
Participants:

 Description   

As part of waitForReadConcernImpl, we could perform noop writes to bump up the clusterTime for afterClusterTime/atClusterTime. And that eventually calls Shard::runCommand to run the appendOplogNote command. On config servers (I believe we use ShardLocal}, this then calls ShardLocal::_runCommand which will then run the appendOplogNote using DBDirectClient. And this could override the readConcernArgs on the opCtx. Overriding the readConcernArgs on the opCtx is dangerous because the subsequent waitUntilOpTimeForRead call relies on the readConcernArgs to wait properly before proceeding with the command. Moreover, this could also affect other assumptions we have with the readConcern for the initial command.



 Comments   
Comment by Jordi Serra Torrens [ 29/Jan/21 ]

We concluded that while the problem described in this bug is true, the fact is that the DBDirectClient (because that's what ShardLocal::_runCommandUses), shouldn't have been used in the first place.

shardRegistry::getConfigShard is returning a ShardLocal object on configsvr secondaries, so that's not suited for the appendOpLogNote write. For shardsvrs, shardRegistry::getShard. I've filed SERVER-54102 to ensure that configsvrs perform the appendOplogNote against the primary.

As for louis.williams proposal to make DBDirectClient use an AlternativeClientRegion, I've filed SERVER-54140.

Comment by Louis Williams [ 21/Jan/21 ]

Can we consider modifying DBDirectClient to use an AlternativeClientRegion? It seems like this would solve the problem more generally so that DBDirectClient can "stack", and OperationContext-decorated state is not implicitly passed down into a DBDirectClient. Instead, callers will need to explicitly transfer any state they wish for a client to inherit (e.g. readConcern, maxTimeMs)

Generated at Thu Feb 08 05:32:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.