[SERVER-27067] Some Commands do not wait for write concern for no-op writes Created: 16/Nov/16  Updated: 20/Dec/23  Resolved: 26/Jul/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.4.0-rc3
Fix Version/s: 3.4.11, 3.5.11

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Judah Schvimer
Resolution: Done Votes: 0
Labels: bkp, writeconcern
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-25765 Commands should wait for write concer... Closed
related to SERVER-30648 Set GlobalLockAcquisitionTracker afte... Closed
related to SERVER-33475 Retried findAndModify doesn't properl... Closed
related to SERVER-33727 Do not wait for write concern if opTi... Closed
related to SERVER-40965 createIndexes should set the latest o... Closed
is related to SERVER-35387 Ensure retried commitTransaction comm... Closed
is related to SERVER-84081 FLE2 write error hides write concern ... Needs Scheduling
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4, v3.2
Sprint: Repl 2017-01-23, Repl 2017-02-13, Repl 2017-04-17, Repl 2017-05-08, Repl 2017-05-29, Repl 2017-07-31
Participants:
Linked BF Score: 0

 Description   

The wait for write concern code returns early if the ReplClientInfo is not set:

https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/db/write_concern.cpp#L223-L226

This means that if certain commands that does write actually ends up as no-op, it will actually not end up waiting for write concern.

Example case:

  1. Client sends drop command to mongos.
  2. Drop command times out waiting for write concern at mongod.
  3. Mongos retries command since write concern timeout is a retriable error with Shard::RetryPolicy::kIdempotent.
  4. Mongos retries drop command to mongod.
  5. Since collection is already dropped, it will get "ns does not exist" error.
  6. If the mongos used a new connection, it will not wait for replication at all because it does not have any client opTime set. However, if the same connection is used, it will wait for the write concern as expected.

Some of the commands already wait for write concern correctly even though it ends up doing a no-op write via repl::ReplClientInfo::setLastOpToSystemLastOpTime (for example, findAndModify, write commands, applyOps, etc).



 Comments   
Comment by Githook User [ 30/Nov/17 ]

Author:

{'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-27067 Remove unnecessary references to setLastOpToSystemLastOpTime

(cherry picked from commit 50cbe6d35a3b61ce674eb0d8fae173f70f1dddd5)
Branch: v3.4
https://github.com/mongodb/mongo/commit/10d74315247887467f1234b7566265ac749d5f69

Comment by Githook User [ 30/Nov/17 ]

Author:

{'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-27067 sets client last op to system last optime if the global exclusive lock was taken
Branch: v3.4
https://github.com/mongodb/mongo/commit/90228aceac05cffdd3b6392653b0b44492068ee6

Comment by Githook User [ 30/Nov/17 ]

Author:

{'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-27067 decorates operation context with boolean when global lock is taken in X or IX mode

(cherry picked from commit 5dda752d6cc8ed1f75a564b03b7357592f488d93)
Branch: v3.4
https://github.com/mongodb/mongo/commit/b02fdd26ea4b3d238c33d045cd767f22c591face

Comment by Githook User [ 26/Jul/17 ]

Author:

{'email': 'judah@mongodb.com', 'username': 'judahschvimer', 'name': 'Judah Schvimer'}

Message: SERVER-27067 Remove unnecessary references to setLastOpToSystemLastOpTime
Branch: master
https://github.com/mongodb/mongo/commit/50cbe6d35a3b61ce674eb0d8fae173f70f1dddd5

Comment by Githook User [ 17/Jul/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27067 sets client last op to system last optime if the global exclusive lock was taken
Branch: master
https://github.com/mongodb/mongo/commit/27cf9fd7b31f043af913da135385367126f5691b

Comment by Githook User [ 29/Jun/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27067 decorates operation context with boolean when global lock is taken in X or IX mode
Branch: master
https://github.com/mongodb/mongo/commit/5dda752d6cc8ed1f75a564b03b7357592f488d93

Comment by Spencer Brody (Inactive) [ 08/Jun/17 ]

Current implementation plan:

  1. Start decorating the operation context with a boolean any time the global lock is taken in X or IX mode.
  2. When executing any command that supports write concern, take note of the last op on the client before and after the command runs. If the last op on the client doesn't change but there is an opctx decoration indicating the command took a global X or IX lock, then set the last op on the client to the last op on the system.
  3. Wait for write concern for the last op on the client.
  4. Remove unnecessary references to setLastOpToSystemLastOpTime throughout the code. Leave the special last client op handling in the batch write commands.

Note that this solution won't automatically handle the case where a single command does multiple writes and the last one is a no-op. In that case we could wind up not waiting long enough to include the no-op write. The only case where this can happen currently though is in multi-writes in the batch write commands, which already have special logic to handle this case.

Generated at Thu Feb 08 04:14:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.