[SERVER-44250] startIndexBuild oplog write and thread pool scheduling are not serialized between concurrent threads on primaries Created: 25/Oct/19 Updated: 29/Oct/23 Resolved: 13/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Execution Team 2019-11-18 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 13 | ||||||||||||||||||||||||
| Description |
|
Secondaries serialize all oplog commands, which means that the code in startIndexBuild to 1) write the "startIndexBuild" oplog entry and 2) schedule the task on the thread pool cannot race with other threads doing the same thing. On primares, however, these two operations are not protected from being concurrent, so it would be possible to have two concurrent threads interleave. This leads to a situation described below where the thread pool size is only 1:
The following original description does not accurately describe the full problem:We limit the maximum number of index build worker threads to 10, but there is no high-level restriction on the number of active index build threads.
This is problematic for secondaries in the following scenario:
We should do one of the following:
|
| Comments |
| Comment by Githook User [ 13/Nov/19 ] |
|
Author: {'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}Message: |
| Comment by Louis Williams [ 13/Nov/19 ] |
|
We're going to use a mutex for now to enable test coverage. It behaves correctly, but it depends on thread pool behavior that is subject to change in the future. The plan is to follow-up with |
| Comment by Louis Williams [ 08/Nov/19 ] |
|
There are two ways I see of fixing this bug:
|