[SERVER-75278] Consider sleeping only if there are more to fetch in ReshardingOplogFetcher Created: 24/Mar/23  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Randolph Tan Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Cluster Scalability
Participants:
Linked BF Score: 5

 Description   

Consider moving this sleep to maybe the following then statement. And perhaps change this line to something like this:

// original comment on why we're sleeping
return executor->sleepFor(Seconds{1}, cancelToken).then([...] {
  return _reschedule(std::move(executor), cancelToken, factory);
});



 Comments   
Comment by Max Hirschhorn [ 07/Jun/23 ]

Chatted with Randolph over Slack and I have a better understanding of what this ticket was intending to cover. The issue with the following code is that the sleep will still execute even when moreToCome == false and _reschedule() wouldn't be sending a new aggregate command to the donor shard anyway. Removing this sleep when moreToCome == false by restructing the code would be very beneficial because the sleep in that case happens during the critical of resharding. The recipient shard is unable to transition to kStrictConsistency until the future returned by the ReshardingOplogFetcher is ready.

.then([executor, cancelToken](bool moreToCome) {
    // Wait a little before re-running the aggregation pipeline on the donor's oplog. The
    // 1-second value was chosen to match the default awaitData timeout that would have been
    // used if the aggregation cursor was TailableModeEnum::kTailableAndAwaitData.
    return executor->sleepFor(Seconds{1}, cancelToken).then([moreToCome] {
        return moreToCome;
    });
})

Comment by Randolph Tan [ 07/Jun/23 ]

We use a sentintel oplog entry to indicates that we don't need to fetch more entries. What I'm proposing is to not sleep if we already know that moreToCome is false.

Comment by Max Hirschhorn [ 07/Jun/23 ]

Consider sleeping only if there are more to fetch in ReshardingOplogFetcher

I'm not sure how the ReshardingOplogFetcher can know whether there will be more data to fetch without sending a new aggregate command after its cursor has been exhausted. Ultimately we would want to use tailable, awaitData cursor for this purpose and have the waiting happening on the donor shard side. I can imagine having the ReshardingOplogFetcher only sleep if the cursor returned no results at all when a new aggregate command was sent. In this manner reestablishing a new cursor won't be done immediately back-to-back when there hasn't been any new writes destined for the recipient shard.

Generated at Thu Feb 08 06:29:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.