Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Works as Designed
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
QE 2021-08-09, QE 2021-08-23
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When resuming a change stream, there are expensive queries issued against the oplog using a `COLLSCAN`. Given that the oplog is naturally sorted, and given that the resume token encodes time info, why does Mongo need to start from the top of the oplog to find where the change event document? Shouldn't search on this sorted list be `O(log(n))`?

My understanding is that the oplog is mainly for replication, and there are no indexes on the oplog. However, it does follow a sort order, so it seems Mongo can take advantage of this for faster lookups for the resume token.

Is this just a limitation of the existing oplog architecture? Could this be supported in the future? Maybe what can help me better understand is: how are change stream `_id`s derived? Is the main reason for this limitation that we don't have _ids on the oplog? Even if that's a case, could we first identify the timestamp and then use that to find the respective op?

Assignee:: Bernard Gorman
Reporter:: David Zhu
Participants:: Bernard Gorman, David Zhu, Joseph Dani
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jul 08 2021 05:35:16 AM UTC
Updated:: Oct 27 2023 01:52:20 PM UTC
Resolved:: Aug 08 2021 03:17:35 AM UTC

Details

Description

Attachments

Activity

People

Dates