[SERVER-38675] Do not check duplicate key constraints for index builds on secondaries Created: 17/Dec/18  Updated: 29/Oct/23  Resolved: 02/Jan/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.1.7

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage NYC 2018-12-31, Storage NYC 2019-01-14
Participants:
Linked BF Score: 6

 Description   

If a background index build on a secondary finishes in the middle of a batch, while there are temporarily duplicate key violations on an index, it will fail.

This did not used to be a problem, but now that we perform duplicate key checks at completion, this error is more likely to occur.

This may involve taking the PBWM lock to so index build completion does not conflict with any replication batches.



 Comments   
Comment by Githook User [ 02/Jan/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-38675 Do not check duplicate key constraints for index builds on secondaries
Branch: master
https://github.com/mongodb/mongo/commit/93bec638a4b5acef4664006b47ade13788d64bf8

Comment by Louis Williams [ 19/Dec/18 ]

daniel.gottlieb the operation holding the PBWM lock is the "multiSyncApply" thread, while the operation that calls TempRelease and awaitNoBgOpInProgress is a foreground index build, but in a different "repl writer worker" thread.

I think we should just not do constraint checking on secondaries. That will prevent the problem entirely of seeing duplicates on a secondary.

Comment by Daniel Gottlieb (Inactive) [ 19/Dec/18 ]

louis.williams, there seems to already be machinery in repl where an operation that fails because of a competing background operation being in progress will temp release its locks and perform the waiting.

Looking at the code, I see a bunch of asserting there are no background operations, but only one call (I think) that waits. When I looked at the rollback's call that waits for background operations to complete, I saw it grabbed/released a global MODE_IS lock. I assumed that meant the rollback client did not have the PBWM lock, but perhaps I was mistaken?

Can you provide which operation was holding the PBWM lock?

Generated at Thu Feb 08 04:49:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.