[SERVER-39372] Make secondary lock acquisition for DDL operations consistent with behavior on primary Created: 04/Feb/19  Updated: 29/Oct/23  Resolved: 04/Mar/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.1.9

Type: Bug Priority: Major - P3
Reporter: Xiangyu Yao (Inactive) Assignee: Xiangyu Yao (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-39169 Add testing-only support for doing sn... Closed
Related
related to SERVER-40723 Deadlock between S lock acquisition o... Closed
related to SERVER-41090 remove noop oplog entry DB X lock acq... Closed
is related to SERVER-39096 Prepared transactions and DDL operati... Closed
is related to SERVER-37199 Yield locks of transactions in second... Closed
is related to SERVER-38588 Hybrid index builds do not work when ... Closed
is related to SERVER-39139 Remove testing support for secondary ... Closed
is related to SERVER-39321 Re-enable the CheckReplDBHashInBackgr... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage NYC 2019-02-25, Storage NYC 2019-03-11
Participants:
Linked BF Score: 9

 Description   

Currently, DDL operations except createIndexes acquires Global X lock on the secondary. This causes a problem described in the comments of SERVER-39096.



 Comments   
Comment by Githook User [ 01/Mar/19 ]

Author:

{'name': 'Xiangyu Yao', 'username': 'xy24', 'email': 'xiangyu.yao@mongodb.com'}

Message: SERVER-39372 Make secondary lock acquisition for DDL operations consistent with behavior on primary
Branch: master
https://github.com/mongodb/mongo/commit/9b596d485e5be0cb65bef50343ab98db9ed09c2a

Comment by Geert Bosch [ 13/Feb/19 ]

Not by itself at this point, as index creation still takes DB locks. We're working on making those locks more granular separately. So this ticket is necessary but not sufficient.

Comment by Max Hirschhorn [ 11/Feb/19 ]

Max Hirschhorn, will the above approach solve the secondary snapshot reads needs of SERVER-39321?

judah.schvimer, I haven't really been following along so I would defer to Geert.

Comment by Judah Schvimer [ 11/Feb/19 ]

max.hirschhorn, will the above approach solve the secondary snapshot reads needs of SERVER-39321?

Comment by Geert Bosch [ 08/Feb/19 ]

That goes exactly in the direction that we want to go, and may indeed be the simplest approach to fixing this issue. I'm in favor.

Comment by Eric Milkie [ 08/Feb/19 ]

I am in favor of it; I would like geert.bosch's opinion as well before proceeding.

Comment by Judah Schvimer [ 08/Feb/19 ]

Ok, so the final proposal is this ticket, plus removing the unsynchronized strong lock from index builds, plus SERVER-39235? I am on board with that if no one else sees a problem.

Comment by Eric Milkie [ 08/Feb/19 ]

Only replication two-phase-drops does this, and that I believe is only in use for enableReadConcernMajority:false. SERVER-39235 is planned as a quickwin to eliminate use of replication two-phase drops.

Comment by Judah Schvimer [ 08/Feb/19 ]

milkie, that sounds good to me. Are there other unsynchronized lock acquisitions? Does two phase drops do this?

Comment by Eric Milkie [ 08/Feb/19 ]

A fourth option is to remove the unsynchronized strong lock from the index builds. It is currently expected to be needed in order to make a durable catalog change indicating the change of phase from collection scan to drain mode. Instead, I believe we can use the presence of the vote document to determine which phase we are in; writing the vote document needs nothing more than intent locks. To round out this solution, in addition to this change, we would also remove the global lock from the replication applier code path for commands.

Comment by Judah Schvimer [ 08/Feb/19 ]

I think that if only the unsynchronized lock acquisitions are compatibleFirst then we are guaranteed to make progress once the final synchronized lock acquisition queues up and prevents new reads from jumping it in line.

Comment by Eric Milkie [ 08/Feb/19 ]

I'm not sure option 3 is viable – it would mean that a user could indefinitely delay an index build with a heavy read workload on the same collection, and eventually it could even stall replication on that node.

Comment by Judah Schvimer [ 07/Feb/19 ]

I personally am fine with all three solutions and see no problems, though investigation is necessary into any other unsynchronized lock acquisitions.

My preference order, given my understanding that the implementation for all three fall on the Storage Team, is 2,1,3. That said, that is almost certainly in reverse order of difficulty.

Comment by Judah Schvimer [ 07/Feb/19 ]

I spoke with geert.bosch about this and we think we have 3 options going forward:

  1. Complete SERVER-39096 and yield locks during prepare conflict retry. This is non-trivial due to the storage/query integration and undesirable to the Storage Team
  2. Do this ticket as scoped and synchronize the currently unsynchronized lock acquisitions in Simultaneous Index Builds. Any other unsynchronized locks in other commands likely also need to be synchronized. drop is the only one I can think of. This is potentially a very large change to Simultaneous Index Builds (and maybe two phase drops), but would be the ideal solution for the replication system. CC dianna.hohensee and milkie
  3. Do this ticket as scoped and make the unsynchronized lock acquisitions compatibleFirst on secondaries. This means that they will not block secondary oplog application from making progress, and in the worst case will occur when secondary oplog application goes to apply the final synchronization point of the Simultaneous Index Build. We also must consider other commands with unsychronized lock acquisitions and if the same solution can and needs to be applied. This solution seems like a good middle ground which allows the secondary to naturally acquire its locks when the primary acquired them without synchronizing them explicitly, but also guarantee forward progress (at least for index builds).

If you have any thoughts, preferences, or concerns, please weigh in.

Comment by Judah Schvimer [ 04/Feb/19 ]

Speaking with siyuan.zhou and tess.avitabile, we realized that Simultaneous Index Builds on secondaries may cause problems for this solution. They make it impossible to take locks identically on the primary and on the secondary because there are unsynchronized lock acquisitions in the middle of the index build. If one of those lock acquisitions blocks on an afterClusterTime read blocked on a prepared transaction, then a CRUD op on the same collection in oplog application could block behind that lock acquisition and prevent oplog application from making progress enough to commit the prepared transaction. CC geert.bosch

Generated at Thu Feb 08 04:51:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.