Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53624

4.4 mongos does not attach RetryableWriteError label to shutdown errors

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v5.0
    • Sprint:
      Sharding 2021-05-17, Sharding 2021-05-31
    • Case:
    • Linked BF Score:
      49

      Description

      4.4 mongos does not attach RetryableWriteError label to shutdown errors. This a bug because it means that drivers will not retry writes when mongos shuts down.

      To reproduce:

      1. start a sharded cluster with two mongoses on ports 27017 and 27018.
      2. run reproMongosShutdownNoErrorLabelBug.py

      The repro script runs 4 threads which each run a retryable findAndModify which takes 5 seconds to complete (artificially delayed with $where). While the 4 retryable writes are ongoing, another thread causes one of the mongoses to shutdown via shutdown: 1, force: true. The expected behavior should be that mongos returns a Shutdown error with the RetryableWriteError error label which indicates that the client can retry the operation. Instead the mongos returns a Shutdown error without any labels so the driver does not retry:

      2021-01-06 13:43:31,536 [INFO] Thread-5:37 - Command findAndModify with request id 114807987 on server ('localhost', 27017) failed in 2000641 microseconds. Failure:
      {"ok": {"$numberDouble": "0.0"}, "errmsg": "interrupted at shutdown", "code": {"$numberInt": "11600"}, "codeName": "InterruptedAtShutdown", "operationTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}}
      

      The full script output is shown below:

      $ python3.9 reproMongosShutdownNoErrorLabelBug.py
      2021-01-06 13:43:29,215 [INFO] MainThread:68 - Pymongo version: 4.0.dev0
      2021-01-06 13:43:29,218 [INFO] MainThread:23 - Command buildinfo with request id 1457850878 started on server ('localhost', 27017). Full command:
      {"buildinfo": {"$numberInt": "1"}, "lsid": {"id": {"$binary": {"base64": "6ITD9ukmTEKPXHDb4SUuuw==", "subType": "04"}}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969408, "i": 1}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "admin", "$readPreference": {"mode": "primary"}}
      2021-01-06 13:43:29,219 [INFO] MainThread:29 - Command buildinfo with request id 1457850878 on server ('localhost', 27017) succeeded in 229 microseconds. Reply:
      {"version": "4.4.3", "gitVersion": "913d6b62acfbb344dde1b116f4161360acd8fd13", "modules": ["enterprise"], "allocator": "system", "javascriptEngine": "mozjs", "sysInfo": "deprecated", "versionArray": [{"$numberInt": "4"}, {"$numberInt": "4"}, {"$numberInt": "3"}, {"$numberInt": "0"}], "openssl": {"running": "Apple Secure Transport"}, "buildEnvironment": {"distmod": "", "distarch": "x86_64", "cc": "/Applications/Xcode10.2.0.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang: Apple LLVM version 10.0.1 (clang-1001.0.46.3)", "ccflags": "-isysroot /Applications/Xcode10.2.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -mmacosx-version-min=10.13 -target darwin17.0.0 -arch x86_64 -fno-omit-frame-pointer -fno-strict-aliasing -fasynchronous-unwind-tables -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-unused-private-field -Wno-deprecated-declarations -Wno-tautological-constant-out-of-range-compare -Wno-tautological-constant-compare -Wno-tautological-unsigned-zero-compare -Wno-tautological-unsigned-enum-zero-compare -Wno-unused-const-variable -Wno-missing-braces -Wno-inconsistent-missing-override -Wno-potentially-evaluated-expression -Wno-unused-lambda-capture -Wno-exceptions -Wunguarded-availability -fstack-protector-strong -fno-builtin-memcmp", "cxx": "/Applications/Xcode10.2.0.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++: Apple LLVM version 10.0.1 (clang-1001.0.46.3)", "cxxflags": "-Woverloaded-virtual -Werror=unused-result -Wpessimizing-move -Wredundant-move -Wno-undefined-var-template -Wno-instantiation-after-specialization -fsized-deallocation -Wunused-exception-parameter -stdlib=libc++ -std=c++17", "linkflags": "-Wl,-syslibroot,/Applications/Xcode10.2.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -mmacosx-version-min=10.13 -target darwin17.0.0 -arch x86_64 -Wl,-bind_at_load -Wl,-fatal_warnings -fstack-protector-strong -stdlib=libc++ -Wl,-rpath,@loader_path/../lib -Wl,-rpath,@loader_path/../Frameworks", "target_arch": "x86_64", "target_os": "macOS", "cppdefines": "SAFEINT_USE_INTRINSICS 0 PCRE_STATIC NDEBUG BOOST_THREAD_VERSION 5 BOOST_THREAD_USES_DATETIME BOOST_SYSTEM_NO_DEPRECATED BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS BOOST_ENABLE_ASSERT_DEBUG_HANDLER BOOST_LOG_NO_SHORTHAND_NAMES BOOST_LOG_USE_NATIVE_SYSLOG BOOST_LOG_WITHOUT_THREAD_ATTR ABSL_FORCE_ALIGNED_ACCESS"}, "bits": {"$numberInt": "64"}, "debug": false, "maxBsonObjectSize": {"$numberInt": "16777216"}, "ok": {"$numberDouble": "1.0"}, "operationTime": {"$timestamp": {"t": 1609969408, "i": 1}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969408, "i": 1}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}}
      2021-01-06 13:43:29,219 [INFO] MainThread:69 - Mongos version: 4.4.3
      2021-01-06 13:43:29,220 [INFO] MainThread:23 - Command insert with request id 1458777923 started on server ('localhost', 27017). Full command:
      {"insert": "test", "ordered": true, "lsid": {"id": {"$binary": {"base64": "6ITD9ukmTEKPXHDb4SUuuw==", "subType": "04"}}}, "txnNumber": {"$numberLong": "1"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969408, "i": 1}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "documents": [{"_id": {"$oid": "5ff62f01c381ca475a8f06eb"}}, {"_id": {"$oid": "5ff62f01c381ca475a8f06ec"}}]}
      2021-01-06 13:43:29,530 [INFO] MainThread:29 - Command insert with request id 1458777923 on server ('localhost', 27017) succeeded in 310134 microseconds. Reply:
      {"n": {"$numberInt": "2"}, "ok": {"$numberDouble": "1.0"}, "operationTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}}
      2021-01-06 13:43:29,530 [INFO] MainThread:71 - Topology: <TopologyDescription id: 5ff62f01c381ca475a8f06ea, topology_type: Sharded, servers: [<ServerDescription ('localhost', 27017) server_type: Mongos, rtt: 0.000423346000000005>, <ServerDescription ('localhost', 27018) server_type: Mongos, rtt: 0.0007042669999999945>]>
      2021-01-06 13:43:29,532 [INFO] Thread-3:23 - Command findAndModify with request id 2007237709 started on server ('localhost', 27017). Full command:
      {"findAndModify": "test", "query": {"$where": "function() { sleep(5.000000 * 1000); return true; }"}, "new": false, "update": {"a": {"$numberDouble": "0.3098804251543492"}}, "upsert": false, "lsid": {"id": {"$binary": {"base64": "czsq48x7SGmRM5JCcCUXgA==", "subType": "04"}}}, "txnNumber": {"$numberLong": "1"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "$readPreference": {"mode": "primary"}}
      2021-01-06 13:43:29,534 [INFO] Thread-2:23 - Command findAndModify with request id 74243042 started on server ('localhost', 27018). Full command:
      {"findAndModify": "test", "query": {"$where": "function() { sleep(5.000000 * 1000); return true; }"}, "new": false, "update": {"a": {"$numberDouble": "0.078846960311998"}}, "upsert": false, "lsid": {"id": {"$binary": {"base64": "6ITD9ukmTEKPXHDb4SUuuw==", "subType": "04"}}}, "txnNumber": {"$numberLong": "2"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "$readPreference": {"mode": "primary"}}
      2021-01-06 13:43:29,535 [INFO] Thread-5:23 - Command findAndModify with request id 114807987 started on server ('localhost', 27017). Full command:
      {"findAndModify": "test", "query": {"$where": "function() { sleep(5.000000 * 1000); return true; }"}, "new": false, "update": {"a": {"$numberDouble": "0.32786975581722355"}}, "upsert": false, "lsid": {"id": {"$binary": {"base64": "AhqvyEnDS8qXeRpUm69JaQ==", "subType": "04"}}}, "txnNumber": {"$numberLong": "1"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "$readPreference": {"mode": "primary"}}
      2021-01-06 13:43:29,535 [INFO] Thread-4:23 - Command findAndModify with request id 1137522503 started on server ('localhost', 27018). Full command:
      {"findAndModify": "test", "query": {"$where": "function() { sleep(5.000000 * 1000); return true; }"}, "new": false, "update": {"a": {"$numberDouble": "0.5628606621243788"}}, "upsert": false, "lsid": {"id": {"$binary": {"base64": "axIME8ZQQj+WtUrWovKREQ==", "subType": "04"}}}, "txnNumber": {"$numberLong": "1"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "$readPreference": {"mode": "primary"}}
      2021-01-06 13:43:31,531 [INFO] Thread-1:59 - Causing mongos shutdown via {shutdown:1, force:1}
       
      2021-01-06 13:43:31,534 [INFO] Thread-1:23 - Command shutdown with request id 16531729 started on server ('localhost', 27017). Full command:
      {"shutdown": {"$numberInt": "1"}, "force": true, "lsid": {"id": {"$binary": {"base64": "UzOkg3E+RjGR8Y0omOjFZw==", "subType": "04"}}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "admin", "$readPreference": {"mode": "primary"}}
      2021-01-06 13:43:31,536 [INFO] Thread-3:37 - Command findAndModify with request id 2007237709 on server ('localhost', 27017) failed in 2003442 microseconds. Failure:
      {"ok": {"$numberDouble": "0.0"}, "errmsg": "interrupted at shutdown", "code": {"$numberInt": "11600"}, "codeName": "InterruptedAtShutdown", "operationTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}}
      2021-01-06 13:43:31,536 [INFO] Thread-5:37 - Command findAndModify with request id 114807987 on server ('localhost', 27017) failed in 2000641 microseconds. Failure:
      {"ok": {"$numberDouble": "0.0"}, "errmsg": "interrupted at shutdown", "code": {"$numberInt": "11600"}, "codeName": "InterruptedAtShutdown", "operationTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}}
      2021-01-06 13:43:31,536 [ERROR] Thread-5:54 - Thread failed
      Traceback (most recent call last):
        File "/Users/shane/git/mongo-python-driver/reproMongosShutdownNoErrorLabelBug.py", line 50, in find_and_modify
          client.test.test.find_one_and_replace({'$where': delay(5)}, {'a': random.random()})
        File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 3124, in find_one_and_replace
          return self.__find_and_modify(filter, projection,
        File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 2974, in __find_and_modify
          return self.__database.client._retryable_write(
        File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1498, in _retryable_write
          return self._retry_with_session(retryable, func, s, None)
        File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1384, in _retry_with_session
          return self._retry_internal(retryable, func, session, bulk)
        File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1416, in _retry_internal
          return func(session, sock_info, retryable)
        File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 2964, in _find_and_modify
          out = self._command(sock_info, cmd,
        File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 238, in _command
          return sock_info.command(
        File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 724, in command
          self._raise_connection_failure(error)
        File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 708, in command
          return command(self, dbname, spec, slave_ok,
        File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 159, in command
          helpers._check_command_response(
        File "/Users/shane/git/mongo-python-driver/pymongo/helpers.py", line 151, in _check_command_response
          raise NotMasterError(errmsg, response)
      pymongo.errors.NotMasterError: interrupted at shutdown, full error: {'ok': 0.0, 'errmsg': 'interrupted at shutdown', 'code': 11600, 'codeName': 'InterruptedAtShutdown', 'operationTime': Timestamp(1609969409, 8), '$clusterTime': {'clusterTime': Timestamp(1609969409, 8), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}}
      2021-01-06 13:43:31,537 [ERROR] Thread-3:54 - Thread failed
      Traceback (most recent call last):
        File "/Users/shane/git/mongo-python-driver/reproMongosShutdownNoErrorLabelBug.py", line 50, in find_and_modify
          client.test.test.find_one_and_replace({'$where': delay(5)}, {'a': random.random()})
        File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 3124, in find_one_and_replace
          return self.__find_and_modify(filter, projection,
        File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 2974, in __find_and_modify
          return self.__database.client._retryable_write(
        File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1498, in _retryable_write
          return self._retry_with_session(retryable, func, s, None)
        File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1384, in _retry_with_session
          return self._retry_internal(retryable, func, session, bulk)
        File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1416, in _retry_internal
          return func(session, sock_info, retryable)
        File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 2964, in _find_and_modify
          out = self._command(sock_info, cmd,
        File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 238, in _command
          return sock_info.command(
        File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 724, in command
          self._raise_connection_failure(error)
        File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 708, in command
          return command(self, dbname, spec, slave_ok,
        File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 159, in command
          helpers._check_command_response(
        File "/Users/shane/git/mongo-python-driver/pymongo/helpers.py", line 151, in _check_command_response
          raise NotMasterError(errmsg, response)
      pymongo.errors.NotMasterError: interrupted at shutdown, full error: {'ok': 0.0, 'errmsg': 'interrupted at shutdown', 'code': 11600, 'codeName': 'InterruptedAtShutdown', 'operationTime': Timestamp(1609969409, 8), '$clusterTime': {'clusterTime': Timestamp(1609969409, 8), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}}
      2021-01-06 13:43:31,539 [INFO] Thread-1:37 - Command shutdown with request id 16531729 on server ('localhost', 27017) failed in 5153 microseconds. Failure:
      {"errmsg": "connection closed", "errtype": "AutoReconnect"}
      Exception in thread Thread-1:
      Traceback (most recent call last):
        File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 950, in _bootstrap_inner
          self.run()
        File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 888, in run
          self._target(*self._args, **self._kwargs)
        File "/Users/shane/git/mongo-python-driver/reproMongosShutdownNoErrorLabelBug.py", line 60, in shutdown_mongos_after
          client.admin.command('shutdown', 1, force=True)
        File "/Users/shane/git/mongo-python-driver/pymongo/database.py", line 741, in command
          return self._command(sock_info, command, slave_ok, value,
        File "/Users/shane/git/mongo-python-driver/pymongo/database.py", line 629, in _command
          return sock_info.command(
        File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 724, in command
          self._raise_connection_failure(error)
        File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 708, in command
          return command(self, dbname, spec, slave_ok,
        File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 150, in command
          reply = receive_message(sock_info, request_id)
        File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 195, in receive_message
          _receive_data_on_socket(sock_info, 16, deadline))
        File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 292, in _receive_data_on_socket
          raise AutoReconnect("connection closed")
      pymongo.errors.AutoReconnect: connection closed
      2021-01-06 13:43:44,609 [INFO] Thread-2:29 - Command findAndModify with request id 74243042 on server ('localhost', 27018) succeeded in 15074637 microseconds. Reply:
      {"lastErrorObject": {"n": {"$numberInt": "1"}, "updatedExisting": true}, "value": {"_id": {"$oid": "5ff62f01c381ca475a8f06eb"}, "a": {"$numberDouble": "0.32786975581722355"}}, "ok": {"$numberDouble": "1.0"}, "operationTime": {"$timestamp": {"t": 1609969424, "i": 2}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969424, "i": 2}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}}
      2021-01-06 13:43:44,609 [INFO] Thread-2:52 - FIND_AND_MODIFY took: 15.078490972518921
      2021-01-06 13:43:54,616 [INFO] Thread-4:29 - Command findAndModify with request id 1137522503 on server ('localhost', 27018) succeeded in 25079911 microseconds. Reply:
      {"lastErrorObject": {"n": {"$numberInt": "1"}, "updatedExisting": true}, "value": {"_id": {"$oid": "5ff62f01c381ca475a8f06eb"}, "a": {"$numberDouble": "0.078846960311998"}}, "ok": {"$numberDouble": "1.0"}, "operationTime": {"$timestamp": {"t": 1609969434, "i": 2}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969434, "i": 2}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}}
      2021-01-06 13:43:54,616 [INFO] Thread-4:52 - FIND_AND_MODIFY took: 25.083528995513916
      

      I can reproduce this bug on all 4.4 versions and 4.9.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              cheahuychou.mao Cheahuychou Mao
              Reporter:
              shane.harvey Shane Harvey
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: