[SERVER-37336] Test that background index build do not block on prepared transactions on secondaries Created: 26/Sep/18 Updated: 27/Oct/23 Resolved: 14/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Backlog - Replication Team |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | prepare_optional, prepare_testing | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Storage NYC 2018-11-05 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Background index build on secondaries acquires a database lock in X mode and releases it at the beginning, scans the collection with IX lock, and acquires an X lock at the end. The first X lock is acquired by the background job but the running command waits for it, so it won't conflict with transaction operations. The IX lock plays well with transactions. However, the final X lock will cause problem on secondaries. If the background index build completion is blocked by a prepared transaction and replication application on the same database is blocked by the background index build completion. That's a deadlock. The problem is similar to the 2-phase drop locking issue we've seen in |
| Comments |
| Comment by Judah Schvimer [ 14/Dec/18 ] |
|
Closing per siyuan.zhou's comment that any work and testing will be done as part of |
| Comment by Siyuan Zhou [ 13/Dec/18 ] |
|
judah.schvimer, we discovered this issue by code inspection. Discussed offline with storage team as summarized in this comment, we believe |
| Comment by Louis Williams [ 13/Dec/18 ] |
|
There is a current issue with how hybrid, background indexes behave with prepared transactions, summarized in |
| Comment by Eric Milkie [ 09/Nov/18 ] |
|
We discovered that the try-lock idea isn't viable. Going to explore other alternative solutions; one solution is to have prepared transactions not hold any locks when on secondary nodes. |
| Comment by Eric Milkie [ 01/Nov/18 ] |
|
I think we could avoid that by enqueuing a lock request for a limited amount of time, rather than the current tryLock behavior of just looking to see what mode the resource is currently locked in. Would that work? |
| Comment by Geert Bosch [ 01/Nov/18 ] |
|
I'm not so much concerned about the tryLock starving IX locks, but more about the opposite: on a busy system, there may always be open transactions and we'll be trying forever. |
| Comment by Siyuan Zhou [ 31/Oct/18 ] |
|
I believe Eric's solution will work assuming spinning in tryLock X lock doesn't starve other IX locks. |
| Comment by Eric Milkie [ 10/Oct/18 ] |
|
Note that I do not believe it would be possible to require users to somehow cease all background index builds across all nodes of a replica set before running setFCV. We would have to build some new machinery for that, and in addition it would be a pretty onerous upgrade requirement. |
| Comment by Eric Milkie [ 10/Oct/18 ] |
|
After some discussion yesterday afternoon, we had an idea: we could change the index build’s acquisition of a DB X lock at the end of the build to spin in tryLock, thus avoiding a deadlock with any prepared transactions. |
| Comment by Gregory McKeon (Inactive) [ 08/Oct/18 ] |
|
If you have in-progress background index builds using the 4.0 method, you could hit a deadlock once you upgrade to FCV 4.2 if they were still running and you run prepareTransaction. milkie or dianna.hohensee, would you consider not allowing a user to upgrade to FCV 4.2 while a background index build was in progress? |