[SERVER-78311] mongos does not report writeConcernError in presence of writeErrors for insert command Created: 21/Jun/23  Updated: 24/Jan/24  Resolved: 21/Aug/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.6, 6.0.14

Type: Bug Priority: Major - P3
Reporter: Craven Huynh Assignee: Brett Nawrocki
Resolution: Fixed Votes: 0
Labels: c2c, sharded-cluster, sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-80103 Mongos WriteConcernError Behavior Dif... Needs Scheduling
is depended on by TOOLS-3455 Investigate changes in SERVER-78311: ... Needs Triage
is depended on by COMPASS-7133 Investigate changes in SERVER-78311: ... Closed
Documented
is documented by DOCS-16334 Investigate changes in SERVER-78311: ... In Progress
Problem/Incident
Related
related to SERVER-73553 Ensure mongos create command returns ... Open
related to SERVER-81259 updateOne without shard key does not ... Open
related to SERVER-81246 FLE WriteConcernError behavior unclear Closed
is related to SERVER-84081 FLE2 write error hides write concern ... Needs Scheduling
is related to SERVER-76954 Support write concern and handle writ... Closed
Assigned Teams:
Sharding NYC
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v7.0, v6.0, v5.0, v4.4
Steps To Reproduce:

1 - Launch a sharded cluster with 3-nodes and 1-shard.

2 - mongosh into primary mongod and delay replication for all secondary mongod (instructions: https://www.mongodb.com/docs/v6.0/core/replica-set-delayed-member)

3 - mongosh into mongos and use db "test"

4 - This step shows the expected writeConcernError:

[direct: mongos] test> db.runCommand({"insert": "coll", "documents": [{_id: 3}], writeConcern: { w: "majority", j: true, wtimeout: 1000 }})
Uncaught:
MongoWriteConcernError: waiting for replication timed out; Error details: { wtimeout: true, writeConcern: { w: "majority", j: true, wtimeout: 1000, provenance: "clientSupplied" } } at shard01
Additional information: {}
Result: {
  n: 1,
  writeConcernError: {
    code: 64,
    codeName: 'WriteConcernFailed',
    errmsg: 'waiting for replication timed out; Error details: { wtimeout: true, writeConcern: { w: "majority", j: true, wtimeout: 1000, provenance: "clientSupplied" } } at shard01',
    errInfo: {}
  },
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1687374253, i: 2 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1687374253, i: 2 })
}

5 - Rerunning the insert command verbatim a second time will yield a duplicate key writeError instead of the expected writeConcernError:

[direct: mongos] test> db.runCommand({"insert": "coll", "documents": [{_id: 3}], writeConcern: { w: "majority", j: true, wtimeout: 1000 }})
{
  n: 0,
  writeErrors: [
    {
      index: 0,
      code: 11000,
      errmsg: 'E11000 duplicate key error collection: test.coll index: _id_ dup key: { _id: 3 }',
      keyPattern: { _id: 1 },
      keyValue: { _id: 3 }
    }
  ],
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1687374345, i: 1 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1687374338, i: 1 })
}

Sprint: Sharding NYC 2023-07-10, Sharding NYC 2023-07-24, Sharding NYC 2023-08-07, Sharding NYC 2023-08-21, Sharding NYC 2023-09-04
Participants:
Story Points: 4

 Description   

mongos does not report writeConcernError in presence of writeErrors.

This behavior is unexpected because it is different from the behavior of mongod which report both writeConcernError and writeErrors:

shard01 [direct: primary] test> db.runCommand({"insert": "coll", "documents": [{_id: 3}], writeConcern: { w: "majority", j: true, wtimeout: 1000 }})
Uncaught:
MongoWriteConcernError: waiting for replication timed out
Additional information: {
  wtimeout: true,
  writeConcern: {
    w: 'majority',
    j: true,
    wtimeout: 1000,
    provenance: 'clientSupplied'
  }
}
Result: {
  n: 0,
  electionId: ObjectId("7fffffff0000000000000001"),
  opTime: { ts: Timestamp({ t: 1687374618, i: 1 }), t: Long("1") },
  writeErrors: [
    {
      index: 0,
      code: 11000,
      errmsg: 'E11000 duplicate key error collection: test.coll index: _id_ dup key: { _id: 3 }',
      keyPattern: { _id: 1 },
      keyValue: { _id: 3 }
    }
  ],
  writeConcernError: {
    code: 64,
    codeName: 'WriteConcernFailed',
    errmsg: 'waiting for replication timed out',
    errInfo: {
      wtimeout: true,
      writeConcern: {
        w: 'majority',
        j: true,
        wtimeout: 1000,
        provenance: 'clientSupplied'
      }
    }
  },
  ok: 1,
  lastCommittedOpTime: Timestamp({ t: 1687371016, i: 1 }),
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1687374623, i: 1 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1687374618, i: 1 })
}



 Comments   
Comment by Githook User [ 24/Jan/24 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-78311 Mongos no longer suppresses writeConcernErrors

Previously, mongos would not report writeConcernErrors for ordered
writes that had writeErrors. Mongos now should report writeConcernErrors
in the presence of writeErrors regardless of whether the batch is
ordered or unordered.

(cherry picked from commit b8f1dc4dcfdf24bff149fc1bd3aa951455c9b801)

GitOrigin-RevId: 867a8d1230316a2899731731c4e5a4d8dfd60522
Branch: v6.0
https://github.com/mongodb/mongo/commit/42463b1b7e87cdfedb4a7e9927a4cf2715654895

Comment by Githook User [ 22/Jan/24 ]

Author:

{'name': 'Brett Nawrocki', 'email': '90278537+brettnawrocki@users.noreply.github.com', 'username': 'brettnawrocki'}

Message: SERVER-78311 Mongos no longer suppresses writeConcernErrors (#18240)

GitOrigin-RevId: 71c0f9087bce14afded1cbe43d4f855e7e15e4dc
Branch: v7.0
https://github.com/mongodb/mongo/commit/1adb8b3cc565e4d5d87c64cbded6847edba2d98f

Comment by Rohan Sharan [ 21/Sep/23 ]

Hi rachita.dhawan@mongodb.com, any ideas on when the backports would be picked up? This isn't totally urgent, but want to make sure they are on your radar.

Comment by Githook User [ 21/Aug/23 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-78311 Mongos no longer suppresses writeConcernErrors

Previously, mongos would not report writeConcernErrors for ordered
writes that had writeErrors. Mongos now should report writeConcernErrors
in the presence of writeErrors regardless of whether the batch is
ordered or unordered.
Branch: master
https://github.com/mongodb/mongo/commit/b8f1dc4dcfdf24bff149fc1bd3aa951455c9b801

Comment by Rachita Dhawan [ 20/Jul/23 ]

Thanks for explaining. We will go through it in our triage tomorrow.

Comment by Rohan Sharan [ 20/Jul/23 ]

We don't know all of the scenarios in which errors might be obscured by this bug. If it obscures the wrong error, it could lead to data inconsistency. The workaround is for a specific case of this bug, and it feels very brittle. Also, since this is clearly a SERVER bug, we don't want to implement a partial workaround that needs to be removed later (and it won't even solve this issue generally).

I think on the basis of this potentially leading to data inconsistency, we would like this to be looked at sooner rather than later. Also, Lingzhi ran into this issue, so I'm sure the fix will help others as well.

cc craven.huynh@mongodb.com 

Comment by Rachita Dhawan [ 19/Jul/23 ]

rohan.sharan@mongodb.com I recently reached out to Craven for the expected timeline for this and it didn't seem like mongo sync was expecting this soon. I also noticed a REP workaround ticket for this. May I know the urgency for this ticket and the reason behind it?

Comment by Rohan Sharan [ 19/Jul/23 ]

rachita.dhawan@mongodb.com can you provide an update on the progress? This could potentially be problematic for mongosync, and we'd like it to go out as soon as is possible.

Generated at Thu Feb 08 06:38:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.