Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40685

Mongos often fails transactions that use "snapshot" with SnapshotTooOld

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.1.11
    • Component/s: None
    • None
    • ALL
    • Hide
      1. Start a sharded cluster
      2. Still to be determined: Do an unknown sequence of operations that causes the cluster to be in a "bad state". I was only able to repro the issue after running the entire driver test suite against the cluster.
      3. Run PYTHON-1796.py
      Show
      Start a sharded cluster Still to be determined: Do an unknown sequence of operations that causes the cluster to be in a "bad state". I was only able to repro the issue after running the entire driver test suite against the cluster. Run PYTHON-1796.py
    • Storage NYC 2019-05-20, Sharding 2019-05-06
    • 45

      Various driver transaction spec tests are failing on latest sharded clusters. The one commonality between the failures is that they all use read concern "snapshot" and they all fail with SnapshotTooOld on the first operation in the transaction. Was there a recent change in how "snapshot" works?

      Although this issue fails consistently in Evergreen, creating a standalone repro has proven difficult. When starting a fresh sharded cluster the attached repro (PYTHON-1796.py) succeeds. However if I run the entire driver test suite against the cluster first, then the repro starts failing consistently with SnapshotTooOld. Here's an example evergreen failure from today:
      https://evergreen.mongodb.com/task/mongo_python_driver_tests_python_version_supports_openssl_110_test_ssl__auth_ssl~auth_ssl_python_version_requires_openssl_102_plus~3.7_test_latest_sharded_cluster_ea8941ef5d3f60a227cb89021ef7d65d7b06c6e1_19_04_16_21_00_56

      And here's the repro failing:

      $ python PYTHON-1796.py
      First transaction command failed: {'ok': 0.0, 'errmsg': 'Transaction 060cae38-7be3-4240-babe-65739200884c:1 was aborted on statement 0 due to: a non-retryable snapshot error :: caused by :: Encountered error from localhost:27019 during a transaction :: caused by :: Read timestamp Timestamp(1555453221, 10) is older than the oldest available timestamp.', 'code': 239, 'codeName': 'SnapshotTooOld', 'operationTime': Timestamp(1555453221, 11), '$clusterTime': {'clusterTime': Timestamp(1555453221, 11), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'errorLabels': ['TransientTransactionError']}
      Traceback (most recent call last):
        File "PYTHON-1796.py", line 13, in <module>
          coll.insert_one({'_id': i}, session=s)
        File "/home/ubuntu/mongo-python-driver/pymongo/collection.py", line 694, in insert_one
          session=session),
        File "/home/ubuntu/mongo-python-driver/pymongo/collection.py", line 608, in _insert
          bypass_doc_val, session)
        File "/home/ubuntu/mongo-python-driver/pymongo/collection.py", line 596, in _insert_one
          acknowledged, _insert_command, session)
        File "/home/ubuntu/mongo-python-driver/pymongo/mongo_client.py", line 1342, in _retryable_write
          return self._retry_with_session(retryable, func, s, None)
        File "/home/ubuntu/mongo-python-driver/pymongo/mongo_client.py", line 1295, in _retry_with_session
          return func(session, sock_info, retryable)
        File "/home/ubuntu/mongo-python-driver/pymongo/collection.py", line 591, in _insert_command
          retryable_write=retryable_write)
        File "/home/ubuntu/mongo-python-driver/pymongo/pool.py", line 579, in command
          unacknowledged=unacknowledged)
        File "/home/ubuntu/mongo-python-driver/pymongo/network.py", line 150, in command
          parse_write_concern_error=parse_write_concern_error)
        File "/home/ubuntu/mongo-python-driver/pymongo/helpers.py", line 155, in _check_command_response
          raise OperationFailure(msg % errmsg, code, response)
      pymongo.errors.OperationFailure: Transaction 060cae38-7be3-4240-babe-65739200884c:1 was aborted on statement 0 due to: a non-retryable snapshot error :: caused by :: Encountered error from localhost:27019 during a transaction :: caused by :: Read timestamp Timestamp(1555453221, 10) is older than the oldest available timestamp.
      

      I can reproduce this on the latest version:
      db version v4.1.10-94-ga654dcf
      git version: a654dcf592ea7ed65426a0de96b4079ff4fc6716

            Assignee:
            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved: