[SERVER-38675] Do not check duplicate key constraints for index builds on secondaries Created: 17/Dec/18 Updated: 29/Oct/23 Resolved: 02/Jan/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Sprint: | Storage NYC 2018-12-31, Storage NYC 2019-01-14 | ||||
| Participants: | |||||
| Linked BF Score: | 6 | ||||
| Description |
|
If a background index build on a secondary finishes in the middle of a batch, while there are temporarily duplicate key violations on an index, it will fail. This did not used to be a problem, but now that we perform duplicate key checks at completion, this error is more likely to occur. This may involve taking the PBWM lock to so index build completion does not conflict with any replication batches. |
| Comments |
| Comment by Githook User [ 02/Jan/19 ] |
|
Author: {'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}Message: |
| Comment by Louis Williams [ 19/Dec/18 ] |
|
daniel.gottlieb the operation holding the PBWM lock is the "multiSyncApply" thread, while the operation that calls TempRelease and awaitNoBgOpInProgress is a foreground index build, but in a different "repl writer worker" thread. I think we should just not do constraint checking on secondaries. That will prevent the problem entirely of seeing duplicates on a secondary. |
| Comment by Daniel Gottlieb (Inactive) [ 19/Dec/18 ] |
|
louis.williams, there seems to already be machinery in repl where an operation that fails because of a competing background operation being in progress will temp release its locks and perform the waiting. Looking at the code, I see a bunch of asserting there are no background operations, but only one call (I think) that waits. When I looked at the rollback's call that waits for background operations to complete, I saw it grabbed/released a global MODE_IS lock. I assumed that meant the rollback client did not have the PBWM lock, but perhaps I was mistaken? Can you provide which operation was holding the PBWM lock? |