The secondary thread is using a ScopedTaskExecutor to schedule callbacks for the actions it submits. This means that those continuations are canceled on stepdown when the secondary thread exits. However, we are never clearing the state of _outstandingStreamingOps (which is decremented in those callbacks). This means that if we have 20 outstanding operations for all collections on stepdown, on the next stepup of that node, our starting point is 20 and we count up from there. In the case that there are multiple stepdowns/stepups of the same node, this can accumulate and hit kMaxOutstandingStreamingOperations which means that the node will never issue any more defragmentation operations.
The solution to this would be as simple as resetting _outstandingStreamingOps to 0 on step up. We should also consider the auto merger policy which is using a similar approach to cap the outstanding merges. We likely have the same issue with _outstandingActions in the autoMerger and should also set that value to 0 on initialization.