[SERVER-64855] Resolve problem with using op session with direct client in POC Created: 23/Mar/22  Updated: 29/Oct/23  Resolved: 08/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: Andrew Shuvalov (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam2, sharding-nyc-subteam2-catalog-poc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-63598 Umbrella ticket for minimal POC for o... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding NYC 2022-04-04, Sharding NYC 2022-04-18
Participants:
Story Points: 4

 Description   

Repro:

buildscripts/resmoke.py run --suite sharding --numShards=1 --numReplSetNodes=3 --catalogShard=any jstests/sharding/addshard1.js

Need to investigate, but this may be a common failure. See SERVER-35180 why this safeguard was implemented to begin with.

Error:

[js_test:addshard1] s20021| {"t":{"$date":"2022-03-23T23:00:57.855+00:00"},"s":"D1", "c":"ASSERT",   "id":23074,   "ctx":"conn6","msg":"User assertion","attr":{"error":"Location50891: Invalid to set operation session info in a direct client","file":"src/mongo/util/future_impl.h","line":1087}}
[js_test:addshard1] s20021| {"t":{"$date":"2022-03-23T23:00:57.855+00:00"},"s":"D1", "c":"SHARDING", "id":22772,   "ctx":"conn6","msg":"Exception thrown while processing command","attr":{"db":"admin","headerId":238,"error":"Location50891: Invalid to set operation session info in a direct client"}}
[js_test:addshard1] s20021| {"t":{"$date":"2022-03-23T23:00:57.855+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn6","msg":"Slow query","attr":{"type":"command","ns":"testDB.foo","appName":"MongoDB Shell","command":{"shardcollection":"testDB.foo","key":{"a":1},"lsid":{"id":{"$uuid":"cfc58386-5173-412c-a8b9-c1092c53e1ff"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1648076457,"i":23}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"Invalid to set operation session info in a direct client","errName":"Location50891","errCode":50891,"reslen":270,"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"127.0.0.1:37500","protocol":"op_msg","durationMillis":34}}
[js_test:addshard1] uncaught exception: Error: command { "shardcollection" : "testDB.foo", "key" : { "a" : 1 } } failed: {
[js_test:addshard1] 	"ok" : 0,
[js_test:addshard1] 	"errmsg" : "Invalid to set operation session info in a direct client",
[js_test:addshard1] 	"code" : 50891,
[js_test:addshard1] 	"codeName" : "Location50891",
[js_test:addshard1] 	"$clusterTime" : {
[js_test:addshard1] 		"clusterTime" : Timestamp(1648076457, 38),
[js_test:addshard1] 		"signature" : {
[js_test:addshard1] 			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
[js_test:addshard1] 			"keyId" : NumberLong(0)
[js_test:addshard1] 		}
[js_test:addshard1] 	},
[js_test:addshard1] 	"operationTime" : Timestamp(1648076457, 38)
[js_test:addshard1] } :
[js_test:addshard1] _getErrorWithCode@src/mongo/shell/utils.js:24:13
[js_test:addshard1] ShardingTest/this.adminCommand@src/mongo/shell/shardingtest.js:415:15
[js_test:addshard1] @jstests/sharding/addshard1.js:73:3

A possible fix could be to strip the session of this is a local client from a shard to config sever in a catalog shard.



 Comments   
Comment by Jack Mulrow [ 28/Mar/22 ]

andrew.shuvalov, IIRC those uasserts/invariants were added because DDirectClient skips logic necessary for running retryable write / transaction commands (the commands that require "operation session info" ie lsid, txnNumber, autocommit, etc.). It does so because the retryable write / transaction machinery itself uses DBDirectClient and that type inherits its opCtx from its spawning operation, which led to problems with the DBDirectClient operation recursively triggering that machinery.

Of your three workarounds, I believe 1) is the best at this time. We might be able to prove 2) is safe for some operations, but it feels error prone, and 3) would require removing a lot of conditional logic in the command processing / transaction layers, which could take a while. Regular shards already use full clients for operations against themselves via the ShardRemote type, so doing so for a catalog shard too seems reasonable to me.

Generated at Thu Feb 08 06:01:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.