[SERVER-53935] Noop write for afterClusterTime/atClusterTime on config servers could override opCtx's readConcernArgs Created: 20/Jan/21 Updated: 29/Jan/21 Resolved: 29/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Lingzhi Deng | Assignee: | Jordi Serra Torrens |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | Sharding-EMEA, sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
As part of waitForReadConcernImpl, we could perform noop writes to bump up the clusterTime for afterClusterTime/atClusterTime. And that eventually calls Shard::runCommand to run the appendOplogNote command. On config servers (I believe we use ShardLocal}, this then calls ShardLocal::_runCommand which will then run the appendOplogNote using DBDirectClient. And this could override the readConcernArgs on the opCtx. Overriding the readConcernArgs on the opCtx is dangerous because the subsequent waitUntilOpTimeForRead call relies on the readConcernArgs to wait properly before proceeding with the command. Moreover, this could also affect other assumptions we have with the readConcern for the initial command. |
| Comments |
| Comment by Jordi Serra Torrens [ 29/Jan/21 ] |
|
We concluded that while the problem described in this bug is true, the fact is that the DBDirectClient (because that's what ShardLocal::_runCommandUses), shouldn't have been used in the first place. shardRegistry::getConfigShard is returning a ShardLocal object on configsvr secondaries, so that's not suited for the appendOpLogNote write. For shardsvrs, shardRegistry::getShard. I've filed As for louis.williams proposal to make DBDirectClient use an AlternativeClientRegion, I've filed SERVER-54140. |
| Comment by Louis Williams [ 21/Jan/21 ] |
|
Can we consider modifying DBDirectClient to use an AlternativeClientRegion? It seems like this would solve the problem more generally so that DBDirectClient can "stack", and OperationContext-decorated state is not implicitly passed down into a DBDirectClient. Instead, callers will need to explicitly transfer any state they wish for a client to inherit (e.g. readConcern, maxTimeMs) |