Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.11
Affects Version/s: None
Component/s: None
Labels:
None

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

PyMongo's change stream resume logic is broken which can result in changes being missed under some specific circumstances.

What is wrong with the resume logic?
PyMongo does not correctly cache the postBatchResumeToken included in the aggregate command-response when firstBatch is empty.

When does this become a problem?
When a change stream that was started without resumeAfter, startAfter, or startAtOperationTime resumes after a getMore that was run immediately after aaggregate returned an empty firstBatch fails. Consider the following sequence of events:

the driver runs an aggregate command to create a change stream; lets call this instant in time T1
the agregate command response returns an empty firstBatch
the driver tries to iterate the change stream - since the firstBatch was empty, the driver runs a getMore to get more results from the server which fails with some resumable error
the driver tries to resume the change stream - it has no startAfter, resumeAfter, or startAtOperationTime and it hasn't cached the postBatchResumeToken from the initial aggregate so the change stream is created without any of these options set; lets call this instant in time T2

Due to this bug, applications might miss events that occur between T1 and T2 since the resume does not have an appropriate resume token to use.

Original Description

test_change_stream.TestAllScenarios.test_change_streams_change_streams_Test_consecutive_resume occasionally blocks forever causing the test suite to timeout:

 [2020/07/02 04:30:38.875]   test_change_streams_change_streams_Executing_a_watch_helper_on_a_Database_results_in_notifications_for_changes_to_all_collections_in_the_specified_database. (test_change_stream.TestAllScenarios) ... ok (0.092s)
 [2020/07/02 04:30:38.990]   test_change_streams_change_streams_Executing_a_watch_helper_on_a_MongoClient_results_in_notifications_for_changes_to_all_collections_in_all_databases_in_the_cluster. (test_change_stream.TestAllScenarios) ... ok (0.115s)
 [2020/07/02 04:59:05.895] Command stopped early: context canceled
 [2020/07/02 04:59:05.924]   test_change_streams_change_streams_Test_consecutive_resume (test_change_stream.TestAllScenarios) ...
 [2020/07/02 04:59:05.924] Running task-timeout commands.
 [2020/07/02 04:59:05.924] Running command 'shell.exec' (step 1 of 1)

https://evergreen.mongodb.com/task/mongo_python_driver_tests_python_version_requires_openssl_102_plus_test_ssl__platform~ubuntu_16.04_auth_ssl~auth_ssl_python_version~3.8_test_4.4_replica_set_patch_4457714d1b1a9f2e0d3d8b73fb913d024e7512dc_5efd62223e8e866fa878aee1_20_07_02_04_27_16

Seems to be caused by the changes in ~~PYTHON-2143~~.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

python-2305-change-stream.log
Jul 02 2020 04:08:23 PM UTC
24 kB
Shane Harvey

is caused by

PYTHON-2143 Do not repeatedly resume if getMore receives the same error

Closed

Assignee:: Prashant Mital (Inactive)
Reporter:: Shane Harvey
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jul 02 2020 05:22:32 AM UTC
Updated:: Oct 29 2023 02:29:35 AM UTC
Resolved:: Jul 08 2020 09:51:42 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates