Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-2305

Faulty change stream resume logic can result in changes being missed

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.11
    • Component/s: None
    • Labels:
      None

      Description

      PyMongo's change stream resume logic is broken which can result in changes being missed under some specific circumstances.

      What is wrong with the resume logic?
      PyMongo does not correctly cache the postBatchResumeToken included in the aggregate command-response when firstBatch is empty.

      When does this become a problem?
      When a change stream that was started without resumeAfter, startAfter, or startAtOperationTime resumes after a getMore that was run immediately after aaggregate returned an empty firstBatch fails. Consider the following sequence of events:

      • the driver runs an aggregate command to create a change stream; lets call this instant in time T1
      • the agregate command response returns an empty firstBatch
      • the driver tries to iterate the change stream - since the firstBatch was empty, the driver runs a getMore to get more results from the server which fails with some resumable error
      • the driver tries to resume the change stream - it has no startAfter, resumeAfter, or startAtOperationTime and it hasn't cached the postBatchResumeToken from the initial aggregate so the change stream is created without any of these options set; lets call this instant in time T2

      Due to this bug, applications might miss events that occur between T1 and T2 since the resume does not have an appropriate resume token to use.


      Original Description

      test_change_stream.TestAllScenarios.test_change_streams_change_streams_Test_consecutive_resume occasionally blocks forever causing the test suite to timeout:

       [2020/07/02 04:30:38.875]   test_change_streams_change_streams_Executing_a_watch_helper_on_a_Database_results_in_notifications_for_changes_to_all_collections_in_the_specified_database. (test_change_stream.TestAllScenarios) ... ok (0.092s)
       [2020/07/02 04:30:38.990]   test_change_streams_change_streams_Executing_a_watch_helper_on_a_MongoClient_results_in_notifications_for_changes_to_all_collections_in_all_databases_in_the_cluster. (test_change_stream.TestAllScenarios) ... ok (0.115s)
       [2020/07/02 04:59:05.895] Command stopped early: context canceled
       [2020/07/02 04:59:05.924]   test_change_streams_change_streams_Test_consecutive_resume (test_change_stream.TestAllScenarios) ...
       [2020/07/02 04:59:05.924] Running task-timeout commands.
       [2020/07/02 04:59:05.924] Running command 'shell.exec' (step 1 of 1)
      

      https://evergreen.mongodb.com/task/mongo_python_driver_tests_python_version_requires_openssl_102_plus_test_ssl__platform~ubuntu_16.04_auth_ssl~auth_ssl_python_version~3.8_test_4.4_replica_set_patch_4457714d1b1a9f2e0d3d8b73fb913d024e7512dc_5efd62223e8e866fa878aee1_20_07_02_04_27_16

      Seems to be caused by the changes in PYTHON-2143.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              prashant.mital Prashant Mital (Inactive)
              Reporter:
              shane.harvey Shane Harvey
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: