[SERVER-45916] On primary, 2-phase index build cleanup writes an abortIndexBuild oplog entry under a stronger mode user collection lock X which can lead to 3 way deadlock with prepared transactions, step down and index build Created: 31/Jan/20 Updated: 29/Oct/23 Resolved: 17/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Execution Team 2020-04-06, Execution Team 2020-04-20, Execution Team 2020-05-04 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Consider the following sequence, |
| Comments |
| Comment by Githook User [ 17/Apr/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |
| Comment by Louis Williams [ 16/Apr/20 ] |
|
Code review: https://mongodbcr.appspot.com/575110005/ |
| Comment by Suganthi Mani [ 31/Jan/20 ] |
|
I would expect something like this, if a old primary index build gets aborted for some reason except due to killop cmd, then the newly elected primary also will also abort due to same rason and send an abortIndexBuild oplog entry. If that's the case, then, when the killop cmd was successfully able to interrupt the parent createIndex thread, the parent createIndex thread after aborting the index build coordinator thread (i.e., changing the aborted field to true), parent thread should generate the abortIndex oplog entry. To make it work correctly we should also make sure the parent thread holds RSTL in mode IX once we have started the index build. (We are already guaranteeing that by holding the RSTL here) For other index build abortion due to some erroneous data records, we can behave like we do it for commit index build. The newly elected primary take care of abort and generate abortIndex oplog entry. |
| Comment by Suganthi Mani [ 31/Jan/20 ] |
|
milkie response for this problem in the google doc
|