[SERVER-67685] Upgrade to 6.0 should not allow yielding while holding the ScopedRangeDeleterLock (6.0 only) Created: 30/Jun/22 Updated: 27/Oct/23 Resolved: 30/Jun/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 6.0.0-rc12 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Allison Easton | Assignee: | Allison Easton |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Backport Requested: |
v6.0
|
||||
| Participants: | |||||
| Description |
|
Currently, the index scan for setting the orphan document count during upgrade to 6.0 is taking the ScopedRangeDeleterLock (which will take global in IX, config.rangeDeletions in IX, and RangeDeleterCollLock::<collection UUID> in mode X. We then perform and index scan in which we pass the yield policy YIELD_AUTO. Yield auto will yield after a WriteConflictException, after a certain amount of time has passed, or after a certain number of documents have been fetched. When yielding, we will release and reacquire all locks, including the ones for the ScopedRangeDeleter lock. In this case, a range deletion could acquire the ScopedRangeDeleterLock in the middle of the index scan and change the orphan documents, thus messing up the count of orphan documents. |
| Comments |
| Comment by Allison Easton [ 30/Jun/22 ] |
|
There were two areas of concern for yielding during 6.0 upgrade. The first was setting the orphan counters in the range deletion documents (described in the summary). This turns out to be okay because we are not releasing resource mutexes when we yield. This means that even if there was a yield, a range deletion could not acquire the resource mutex and so range deletions cannot occur. The other area of concern was the asynchronous balancer stats registry initialization. In this location, we are using a coarse range deletion lock (IX global and X on config.rangeDeletions). Therefore, if there was a yield, we would trigger an invariant about holding X locks during a yield. However, this aggregation is done using the DBDirectClient which will acquire the global lock again during the execution of the aggregation. Because of this, even though the yield policy defaults to YIELD_AUTO, no yield will be triggered because of the early exit for recursive global locking. |