Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55648

Mongos doesn't return top-level batch-write error in case of shutdown

    • Fully Compatible
    • ALL
    • v4.0
    • Hide

      To reproduce the error apply the provided patch (r4.2.12 - 5593fd8e33b60c7580 ) and run:

      buildscripts/resmoke.py run --suite=sharding jstests/sharding/insert_with_mongos_shutdown.js
      
      Show
      To reproduce the error apply the provided patch (r4.2.12 - 5593fd8e33b60c7580 ) and run: buildscripts/resmoke.py run --suite=sharding jstests/sharding/insert_with_mongos_shutdown.js
    • 85

      Batch write operations could either return a top level error:

      {ok: 0, code: 91, messaage: "Server is shutting down"}

      or a nested array of writeErrors:

      {ok: 1, writeErrors: [ { index: 0, code: 91, message: "Server is shutting down" } ]}
      

      Since our current retryable-write specs is a bit vague around the handling of the batchWrite response in case of writeErrors, drivers only implement retries for top-level errors of a batch write response and won't even look at the retry-able errors in the writeErrors array.

      The problem is that if a mongos gets shutted down in the middle of a batch write execution instead of returning a response with a top level error it could actually return a nested array that won't be retried by drivers.
      So in this case we will have a batch write that fail with a retryable error that won't be retried neither from the mongos nor from the driver.

      I suspect that this is the same underlying issue of SERVER-53624 but that one is specific to mongoDB versions grater than 4.4, given that mongos is attaching retryable error labels only since v4.4.

            Assignee:
            luis.osta@mongodb.com Luis Osta (Inactive)
            Reporter:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: