[SERVER-53454] Return an error future from ReshardingOplogFetcher::awaitInsert if the fetcher has been shut down Created: 18/Dec/20 Updated: 27/Oct/23 Resolved: 20/Jul/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | Max Hirschhorn |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | PM-234-M3, PM-234-T-oplog-fetch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.0
|
||||||||
| Sprint: | Sharding 2021-07-12, Sharding 2021-07-26 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 23 | ||||||||
| Story Points: | 1 | ||||||||
| Description |
|
Reference is in the linked BF comments. The diff to get a repro is attached. This diff may be useful in order to create a test to verify this behavior has been fixed, and will definitely be useful to verify locally that the fix works. |
| Comments |
| Comment by Max Hirschhorn [ 20/Jul/21 ] |
|
I took another look at The changes from 67ff845 as part of I filed |
| Comment by Max Hirschhorn [ 22/Dec/20 ] |
|
Looking at the TestAwaitInsertErrors test case in the attached diff, it seems like Blake demonstrated an issue with referencing the moved-from ReshardingOplogApplier::_onInsertFuture when the caller violates the contract by calling awaitInsert() prior to the future returned by an earlier call to awaitInsert() having become ready? Do we know that it possible for ReshardingDonorOplogIterator to do? ReshardingOplogApplier::_scheduleNextBatch() only ever calls ReshardingDonorOplogIterator::getNext() once due to the current setting for the reshardingBatchLimitOperations server parameter. And the call to ReshardingOplogApplier::_scheduleNextBatch() is responsible for setting up the next call to ReshardingOplogApplier::_scheduleNextBatch() in sequence. It isn't clear to me how we'd have multiple calls to awaitInsert() outstanding at the same time. Could there be something else going on in the Evergreen failure? |