[SERVER-44593] In retryable write that changes shard key value and owning shard, write concern failure results in non-WC error code Created: 13/Nov/19  Updated: 05/Aug/21  Resolved: 05/Aug/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.3 Desired

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Blake Oler
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-45360 Bring shell retryable error codes to ... Closed
Related
related to SERVER-44289 Retryable writes that change shard ke... Closed
Operating System: ALL
Sprint: Sharding 2019-12-16, Sharding 2019-12-30, Sharding 2020-01-13
Participants:

 Description   

Presumably this error is returned because WC timeout is considered a retryable error?

assert: write failed with error: {
    "nMatched" : 0,
    "nUpserted" : 0,
    "nModified" : 0,
    "writeError" : {
            "code" : 217,
            "errmsg" : "Cannot retry a retryable write that has been converted into a transaction"
    }
}

(function() {
'use strict';
 
load("jstests/libs/write_concern_util.js");
load("jstests/sharding/libs/update_shard_key_helpers.js");
 
const st = new ShardingTest({mongos: 1, shards: {
    rs0: {
        nodes: [{}, {}, {rsConfig: {tags: {tag1: "value1"}}}],
        settings: {getLastErrorModes: {tagged: {tag1: 1}}}
    },
    rs1: {nodes: 3}
}});
const wc = {w: "tagged", wtimeout: 6000};
const kDbName = 'db';
const mongos = st.s0;
const shard0 = st.shard0.shardName;
const shard1 = st.shard1.shardName;
const ns = kDbName + '.foo';
 
assert.commandWorked(mongos.adminCommand({enableSharding: kDbName}));
st.ensurePrimaryShard(kDbName, shard0);
 
let session = st.s.startSession({retryWrites: true});
let sessionDB = session.getDatabase(kDbName);
 
let docsToInsert =
    [{"x": 4, "a": 3}, {"x": 78}, {"x": 100}, {"x": 300, "a": 3}, {"x": 500, "a": 6}];
 
shardCollectionMoveChunks(st, kDbName, ns, {"x": 1}, docsToInsert, {"x": 100}, {"x": 300});
cleanupOrphanedDocs(st, ns);
 
// Pause replication on the tagged secondary.
stopServerReplication(st.rs0.nodes[2]);
 
let res = sessionDB.foo.update({x: 4}, {$set: {x: 1000}}, {writeConcern: wc});
// Actually fails with 217 (IncompleteTransactionHistory)
assert.commandWorkedIgnoringWriteConcernErrors(res);
checkWriteConcernTimedOut(res);
 
res = sessionDB.runCommand({
    findAndModify: 'foo',
    query: {x: 78},
    update: {$set: {x: 250}},
    lsid: {id: UUID()},
    txnNumber: NumberLong(1),
    writeConcern: wc,
});
// Actually fails with 217 (IncompleteTransactionHistory)
assert.commandWorkedIgnoringWriteConcernErrors(res);
checkWriteConcernTimedOut(res);
 
restartServerReplication(st.rs0.nodes[2]);
 
mongos.getDB(kDbName).foo.drop();
 
st.stop();
})();



 Comments   
Comment by Max Hirschhorn [ 05/Aug/21 ]

I think we can close SERVER-44593 and SERVER-45360. The mongo shell behavior hasn't been an issue for any of our stepdown suites.

Comment by Blake Oler [ 05/Aug/21 ]

max.hirschhorn now that we don't plan to keep working on the shell for purposes of driver parity, should we close this as won't fix?

Comment by Blake Oler [ 03/Jan/20 ]

After doing some digging, the issue here is that we should not be automatically retrying on receiving the code WriteConcernFailed. The shell is out of date with the driver's current specs, which were explicitly updated to exclude WriteConcernFailed from the list of retryable errors. Once we fix this, we will expect the writes to fail. Blocking this on SERVER-45360.

Generated at Thu Feb 08 05:06:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.