Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61999

Improve 'fullDocument' change stream oplog rewrite for update events

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Execution

      There are two improvements we can make here:

      Firstly, if a user filters on the fullDocument field, we currently allow all non-replacement update events to pass through from the oplog. This is because the fullDocument field is not populated until later in the pipeline for update events, and we therefore cannot apply predicates on it during the oplog scan.

      However, there is one exception; the _id field of fullDocument is available in the o2 field of the oplog event. As long as the filter is not an existence check or comparison against null - since the document may or may not be available when it is subsequently looked up - we can rewrite filters on fullDocument._id into the oplog for update events. This rewrite could result in a worthwhile performance boost in cases where it is applicable, since it will reduce the number of expensive lookups we have to perform. Technically the user could apply the same predicate to documentKey._id instead, but fullDocument is a common filter and it is not evident to users why filtering on documentKey._id would give better performance than fullDocument._id.

      Secondly, our current filter applies the user's predicate to insert and replace oplog entries as follows:

      {$or: [{op: "i"}, {op: "u"}], <predicate>}

      This means that the filter is being applied to all {op: "u"} entries, including non-replacement updates. This is OK from a correctness perspective, since we are returning every non-replacement update anyway, and this filter therefore cannot discard any relevant events - but for certain predicates the time spent doing this extra evaluation could be nontrivial. Additionally, if we do the first improvement mentioned above, then there may be cases where this filter allows updates through the oplog scan which could have been filtered out; this means that we will have to do an expensive lookup of the post-image document only to immediately discard the result.

      We should change this rewrite to the following, so that it only applies to actual replacements:

      {$or: [{op: "i"}, {op: "u", "o._id": {$exists: true}}], <predicate>}

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            bernard.gorman@mongodb.com Bernard Gorman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: