[SERVER-37336] Test that background index build do not block on prepared transactions on secondaries Created: 26/Sep/18  Updated: 27/Oct/23  Resolved: 14/Dec/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Backlog - Replication Team
Resolution: Gone away Votes: 0
Labels: prepare_optional, prepare_testing
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-37199 Yield locks of transactions in second... Closed
is related to SERVER-38588 Hybrid index builds do not work when ... Closed
Assigned Teams:
Replication
Operating System: ALL
Sprint: Storage NYC 2018-11-05
Participants:

 Description   

Background index build on secondaries acquires a database lock in X mode and releases it at the beginning, scans the collection with IX lock, and acquires an X lock at the end. The first X lock is acquired by the background job but the running command waits for it, so it won't conflict with transaction operations. The IX lock plays well with transactions. However, the final X lock will cause problem on secondaries.

If the background index build completion is blocked by a prepared transaction and replication application on the same database is blocked by the background index build completion. That's a deadlock. The problem is similar to the 2-phase drop locking issue we've seen in SERVER-34349.



 Comments   
Comment by Judah Schvimer [ 14/Dec/18 ]

Closing per siyuan.zhou's comment that any work and testing will be done as part of SERVER-38588.

Comment by Siyuan Zhou [ 13/Dec/18 ]

judah.schvimer, we discovered this issue by code inspection. Discussed offline with storage team as summarized in this comment, we believe SERVER-38588 can be fixed without blocking background index builds on active prepared transactions. Thus there is no work for this ticket. Because this is a race condition on secondary, we would have to use fail points to coordinate the events in testing. Given that hybrid index builds will essentially make all index builds background, we will have a pretty good test coverage for both primary and secondary. I think a specific test for this has a low priority.

Comment by Louis Williams [ 13/Dec/18 ]

There is a current issue with how hybrid, background indexes behave with prepared transactions, summarized in SERVER-38588, that makes me believe background index builds should block on active prepared transactions.

Comment by Eric Milkie [ 09/Nov/18 ]

We discovered that the try-lock idea isn't viable. Going to explore other alternative solutions; one solution is to have prepared transactions not hold any locks when on secondary nodes.

Comment by Eric Milkie [ 01/Nov/18 ]

I think we could avoid that by enqueuing a lock request for a limited amount of time, rather than the current tryLock behavior of just looking to see what mode the resource is currently locked in. Would that work?

Comment by Geert Bosch [ 01/Nov/18 ]

I'm not so much concerned about the tryLock starving IX locks, but more about the opposite: on a busy system, there may always be open transactions and we'll be trying forever.

Comment by Siyuan Zhou [ 31/Oct/18 ]

I believe Eric's solution will work assuming spinning in tryLock X lock doesn't starve other IX locks.

Comment by Eric Milkie [ 10/Oct/18 ]

Note that I do not believe it would be possible to require users to somehow cease all background index builds across all nodes of a replica set before running setFCV. We would have to build some new machinery for that, and in addition it would be a pretty onerous upgrade requirement.

Comment by Eric Milkie [ 10/Oct/18 ]

After some discussion yesterday afternoon, we had an idea: we could change the index build’s acquisition of a DB X lock at the end of the build to spin in tryLock, thus avoiding a deadlock with any prepared transactions.

Comment by Gregory McKeon (Inactive) [ 08/Oct/18 ]

If you have in-progress background index builds using the 4.0 method, you could hit a deadlock once you upgrade to FCV 4.2 if they were still running and you run prepareTransaction. milkie or dianna.hohensee, would you consider not allowing a user to upgrade to FCV 4.2 while a background index build was in progress?

Generated at Thu Feb 08 04:45:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.