[SERVER-80174] mongos should be able to handle top-level error responses received from mongod for bulkWrite command Created: 16/Aug/23  Updated: 06/Dec/23  Resolved: 22/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Kaitlin Mahar Assignee: Kaitlin Mahar
Resolution: Fixed Votes: 0
Labels: milestone-2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-79506 Get all bulkWrite js core tests runni... Closed
is depended on by SERVER-80729 Add targeted chunk migration tests fo... Closed
Related
related to SERVER-81382 Complete TODO listed in SERVER-80174 Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Sprint: Repl 2023-09-04, Repl 2023-09-18, Repl 2023-10-02
Participants:

 Description   

Right now our logic to process batch command responses for bulkWrite on mongos assumes that, if we have a response, it strictly matches the form of BulkWriteCommandReply.

However, it is possible that we get back top-level error responses e.g. if a transient transaction error occurs, which currently can lead to parsing errors here. For example:

{"errorLabels":["TransientTransactionError"],"ok":0,"errmsg":"sharding status of collection test.coll is not currently known and needs to be recovered","code":13388,"codeName":"StaleConfig","ns":"test.coll","vReceived":{"e":{"$oid":"000000000000000000000000"},"t":{"$timestamp":{"t":0,"i":0}},"v":{"$timestamp":{"t":0,"i":0}}},"shardId":"shard-rs0","$clusterTime":{"clusterTime":{"$timestamp":{"t":1692221464,"i":25}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$configTime":{"$timestamp":{"t":1692221464,"i":19}},"$topologyTime":{"$timestamp":{"t":1692221457,"i":4}},"operationTime":{"$timestamp":{"t":1692221464,"i":25}}}

Additionally, the logic for batch insert/update/delete has some special casing for errors with the TransientTransactionError label, which we don't appear to have yet for bulkWrite.

These issues can be observed by running bulk_write_update_cursor.js or bulk_write_delete_cursor.js in the sharded_multi_stmt_txn_jscore_passthrough suite and will also appear less consistently in other bulkwrite core tests and txn passthroughs where we may see StaleConfig errors.

It would be good to add some targeted tests around handling of these reply types along with fixing this. 



 Comments   
Comment by Githook User [ 06/Dec/23 ]

Author:

{'name': 'Kaitlin Mahar', 'email': 'kaitlin.mahar@mongodb.com', 'username': 'kmahar'}

Message: SERVER-81382 Complete todo from SERVER-80174 to unskip test

GitOrigin-RevId: 577515bedffdcd158453b070c434379ddb305f80
Branch: master
https://github.com/mongodb/mongo/commit/122bf7bb44a50dc5de7cc7465f1a43bfaca5c882

Comment by Githook User [ 22/Sep/23 ]

Author:

{'name': 'Kaitlin Mahar', 'email': 'kaitlin.mahar@mongodb.com', 'username': 'kmahar'}

Message: SERVER-80174 Add mongos logic to handle top-level error responses from mongod for bulkWrite
Branch: master
https://github.com/mongodb/mongo/commit/26cd6a5821430bce23c4b85660ca7adebd8ef748

Generated at Thu Feb 08 06:42:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.