Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58473

Change stream startup with resumeToken uses COLLSCAN and is slow

    • Type: Icon: Question Question
    • Resolution: Works as Designed
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • QE 2021-08-09, QE 2021-08-23

      When resuming a change stream, there are expensive queries issued against the oplog using a `COLLSCAN`. Given that the oplog is naturally sorted, and given that the resume token encodes time info, why does Mongo need to start from the top of the oplog to find where the change event document? Shouldn't search on this sorted list be `O(log(n))`?

      My understanding is that the oplog is mainly for replication, and there are no indexes on the oplog. However, it does follow a sort order, so it seems Mongo can take advantage of this for faster lookups for the resume token.

      Is this just a limitation of the existing oplog architecture? Could this be supported in the future? Maybe what can help me better understand is: how are change stream `_id`s derived? Is the main reason for this limitation that we don't have _ids on the oplog? Even if that's a case, could we first identify the timestamp and then use that to find the respective op? 

            Assignee:
            bernard.gorman@mongodb.com Bernard Gorman
            Reporter:
            david@vanta.com David Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: