Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79771

Make Resharding Operation Resilient to NetworkInterfaceExceededTimeLimit

    XMLWordPrintableJSON

Details

    • Sharding NYC
    • Fully Compatible
    • v7.0, v6.0, v5.0
    • Sharding NYC 2023-09-04

    Description

      Pasting Max's findings:

      The problematic area is in https://github.com/mongodb/mongo/blob/r5.0.19/src/mongo/db/s/resharding/resharding_oplog_fetcher.cpp#L202-L203 where likely at the time of writing the code it was assumed because the function returns a StatusWith<> result it wouldn't be throwing an exception yet it seems like the function can also throw an exception. And so the exception causes the function to propagate an error rather than swallowing the error and retrying by doing the return true.

      The ReshardingRecipientService should retry on transient NetworkTimeoutError category errors too in any retry loop. Since the change will be done in resharding_future_util.h, this improvement should affect all code using resharding::withAutomaticRetry

      Attachments

        Activity

          People

            abdul.qadeer@mongodb.com Abdul Qadeer
            abdul.qadeer@mongodb.com Abdul Qadeer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: