[SERVER-81259] updateOne without shard key does not handle WriteConcernErrors properly Created: 20/Sep/23  Updated: 05/Feb/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Vishnu Kaushik Assignee: Jason Zhang
Resolution: Unresolved Votes: 0
Labels: sharding-nyc-subteam3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-81261 Handle writeConcernErrors for writes ... Blocked
Related
is related to SERVER-78311 mongos does not report writeConcernEr... Closed
is related to SERVER-81246 FLE WriteConcernError behavior unclear Closed
Assigned Teams:
Sharding NYC
Operating System: ALL
Sprint: Cluster Scalability 2023-11-13, Cluster Scalability 2023-11-27, Cluster Scalability 2023-12-11, Cluster Scalability 2023-12-25, Cluster Scalability 2024-1-8, Cluster Scalability 2024-1-22, Cluster Scalability 2024-2-5, Cluster Scalability 2024-2-19
Participants:

 Description   

When an update with multi: true encounters a WriteConcernError on mongos, we get the following response. Note that the WCE appears in the writeConcernError field:

{
	"nModified" : 2,
	"n" : 2,
	"writeConcernError" : {
		"code" : 64,
		"codeName" : "WriteConcernFailed",
		"errmsg" : "Multiple errors reported :: UnsatisfiableWriteConcern: Not enough data-bearing nodes; Error details: { writeConcern: { w: 3, wtimeout: 0, provenance: \"clientSupplied\" } } at shard-rs0 :: and :: UnsatisfiableWriteConcern: Not enough data-bearing nodes; Error details: { writeConcern: { w: 3, wtimeout: 0, provenance: \"clientSupplied\" } } at shard-rs1",
		"errInfo" : {
			
		}
	},
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1695235981, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1695235981, 1)
}

On the other hand, when using multi: false on a multi-shard cluster (therefore triggering updateOne without shard key), we see that the WCE is placed into writeErrors instead, which is inconsistent with the above.

{
	"nModified" : 0,
	"n" : 0,
	"writeErrors" : [
		{
			"index" : 0,
			"code" : 100,
			"errmsg" : "Write results unavailable from failing to target a host in the shard shard-rs1 :: caused by :: Command error committing internal transaction :: caused by :: Not enough data-bearing nodes; Error details: { writeConcern: { w: 3, wtimeout: 0, provenance: \"clientSupplied\" } }"
		}
	],
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1695235903, 2),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1695235903, 2)
}

Also, when some other server error appears, like a TypeMismatch, the WCE is hidden and only the TypeMismatch is shown.

{
	"nModified" : 0,
	"n" : 0,
	"writeErrors" : [
		{
			"index" : 0,
			"code" : 14,
			"errmsg" : "Write results unavailable from failing to target a host in the shard shard-rs0 :: caused by :: Cannot apply $inc to a value of non-numeric type. {_id: 1000.0} has the field 'a' of non-numeric type array"
		}
	],
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1695236285, 4),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1695236285, 2)
}

See SERVER-81246 which is a ticket for a similar bug on FLE (also uses internal transactions).


Generated at Thu Feb 08 06:46:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.