Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Execution
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

After ~~SERVER-66631~~ change_stream_multitenant_sharded_cluster_passthrough randomly started failing with different test cases. The failures were related to ChangeStreamHistoryLost.
Evergreen Link here.

To mitigate this issue sleep was added.

The current explanation for this problem is as follows:

We are creating the change collection on every primary node explicitly and independently by issuing the enablement command.
Each node's latest oplog timestamp might be different, for eg, the latest oplog timestamp for node1 might be Timestamp (133456788, 1) and for the other, it could be Timestamp (123456789, 1).

As such, when we create change collection on these nodes, their corresponding oplog entries in node 1 would become Timestamp(133456788, 2) and on node 2 Timestamp(123456789, 2). These will also define the start timestamp for each change collection.

Since the timestamps are different in both nodes, a getMore with timestamp Timestamp (133456788, 1) on node 2 will cause the change stream to fail.

Since there is no entity (like configSvr in the case of mongoS) that orchestrates the creation process, the differences in the timestamps on different nodes seem to be causing this situation.

And since the differences in timestamps between nodes are smaller (test-fixture spins up nodes quickly), the sleep causes the periodic-noop writer to write a few oplog entries and bump up the timestamp. The latest oplog timestamp is now later than the beginning of all change collections' first entries and thus prevents failures.

It should be noted that there is already a ticket to enable change stream in mongoQ - SERVER-68341 and that should solve this problem. This is more about further investigating the issue and coming up with a better workaround (not using sleep) for time being.

is related to

SERVER-68341 Implement enable/disable command for mongoQ in serverless

Backlog

Assignee:: [DO NOT USE] Backlog - Query Execution
Reporter:: Rishab Joshi (Inactive)
Participants:: [DO NOT USE] Backlog - Query Execution, Rishab Joshi
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Aug 23 2022 09:38:31 AM UTC
Updated:: Oct 16 2023 12:37:59 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates