[SERVER-41469] Enforce w:1 for creation of transactions table on step-up Created: 03/Jun/19  Updated: 29/Oct/23  Resolved: 01/Jul/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.2.0-rc4, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Vesselina Ratcheva (Inactive) Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-29495 Change DBDirectClient to access the q... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Repl 2019-07-01, Repl 2019-07-15
Participants:
Linked BF Score: 9

 Description   

We create the transactions table on step-up via a DBDirectClient call. That will inherit the default writeConcern, which is a problem if the user changed it from w:1. In that case, the call will wait on that WC immediately, while also holding locks (particularly the RSTL in mode X, from the step-up hook). We do not want to do this, as that can block other processes, including servicing find commands for replication.
We should instead create the transactions table via the storage interface, as that will give us w:1 in all cases.



 Comments   
Comment by Githook User [ 19/Jul/19 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-41469 Enforce w:1 for creation of transactions table on step-up

(cherry picked from commit a351f48ad122ca59ed45e5df877ef398c099c938)
Branch: v4.2
https://github.com/mongodb/mongo/commit/56852d83dd92ab47063672bd48d94e4439f8c23e

Comment by Githook User [ 01/Jul/19 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-41469 Enforce w:1 for creation of transactions table on step-up
Branch: master
https://github.com/mongodb/mongo/commit/a351f48ad122ca59ed45e5df877ef398c099c938

Comment by Vesselina Ratcheva (Inactive) [ 11/Jun/19 ]

judah.schvimer This is indeed a deadlock. The entire replica set can actually fail to make progress because of this, as the primary would not be able to service the finds required for secondaries to replicate the table, while waiting on exactly that table to be replicated. While this can only happen in the relatively niche use case where users modify the default write concern, the fix here is very straightforward and I think we should do it sooner.
This is really part of a bigger problem with DBDirectClient best described in SERVER-29495, but I think we should still make this fix.

Comment by Judah Schvimer [ 11/Jun/19 ]

vesselina.ratcheva, what is the user-visible bug here? Is this a deadlock? If it's a bug then it doesn't feel like "Tech Debt".

Comment by Vesselina Ratcheva (Inactive) [ 03/Jun/19 ]

While there hasn't been a 4.0 BF, this is also possible on that version, the only real difference being that we take Global X instead of RSTL X (which does not exist yet).

Comment by Judah Schvimer [ 03/Jun/19 ]

vesselina.ratcheva, is this a 4.0 bug as well?

Generated at Thu Feb 08 04:57:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.