Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-67650

Resharding recipient can return remainingOperationTimeEstimatedSecs=0 when the oplog applier hasn't caught up with the oplog fetcher

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 5.3.0, 5.0.0, 6.0.0-rc12
    • 6.0.2, 5.0.13
    • None
    • Fully Compatible
    • ALL
    • v5.0
    • 3

    Description

      If a recipient receives a _shardsvrReshardingOperationTime command right after it has transitioned to the "applying" state (i.e. oplogEntriesApplied = 0), 'remainingOperationTimeEstimatedSecs' would be calculated using 'bytesCopied' and 'bytesToCopy', and the elapsed time of the "cloning" state.

      It turns out that the start time of the "cloning" state only gets initialized when a recipient transitions from "create-collection" the "cloning" state. If the ReshardingRecipientService instance is created while the the recipient (i.e. on restart or stepup) is already the "cloning" state, we would skip the state transition so the start time would be uninitialized. Consequently, 'remainingOperationTimeEstimatedSecs' would be 0 since the elapsed time for the cloning state would be 0. The issue here is also that There isn't a mechanism for persisting the start time and recovering it on stepup. Returning remainingOperationTimeEstimatedSecs=0 would cause the coordinator to think that it can start the critical section and the resharding operation to fail with ReshardingCriticalSectionTimeout if the recipient doesn't manage to enter the "strict-consistency" state within the timeout .

      The same bug exists for the start time for the "applying" state. 

      Attachments

        Issue Links

          Activity

            People

              andrew.witten@mongodb.com Andrew Witten
              cheahuychou.mao@mongodb.com Cheahuychou Mao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: