Improve event types and fields selection in change streams

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The following is currently more a vague idea proposal than an actionable change request. The details still have to be worked out and agreed-on by all stakeholders before it can be implemented.

      Currently there are several change stream flags that control which events and/or fields are emitted by change streams:

      • showExpandedEvents: controls if additional (DDL) event types are emitted and if additional fields (collectionUUID, updateDescription.disambiguatedPaths) are emitted.
      • showSystemEvents: controls whether or not events from the system namespace are emitted.
      • showCommitTimestamp (since v8.2.1): controls if the commitTimestamp field is emitted.
      • showMigrationEvents: controls if migration events are emitted.
      • showRawUpdateDescription: controls whether the raw update description from the oplog is emitted in the rawUpdateDescription field or a structured update description is emitted in the updateDescription field.

      Some of these flags are used only internally and are purposefully undocumented.
      Presumably all flags except showExpandedEvents have no value for end users and are currently only used internally or by tools such as mongosync.

      showExpandedEvents currently has two responsibilities: first, it enables additional events to be emitted, and second, it enables the emission of extra fields. It is currently not possible to only enable the extra events without activating the additional fields, or vice versa.

      In the future, there should be a better way for end users, internal callers and tools to configure

      • what types of events to emit
      • what fields to emit inside these events

      The change stream configuration could be achieved by specifying the event types and the fields separately.

      Following is an idea of how this could look like:

      {
        $changeStream: {
          eventTypes: ["crud", "ddl", "system", "migration", (potentially other types)], 
          fields: [...]
        }
      }
      

      To make a change stream return all event types and all fields there should possibly also be an "all" value to select all types/fields, e.g.

      {
        $changeStream: {
          eventTypes: "all",
          fields: "all"
        }
      }
      

      We could also pre-define certain presets of fields of fields into compatbility groups to make it easily possible to select different behaviors, e.g. "minimal", "extended" etc. could each contain a predefined set of fields to emit.

      Change streams could use the "fields" configuration to only emit the actually required fields in the transform stage.

      An alternative would be to use $project to configure which fields to emit.
      The downside of using projections would be that they will require more typing for inclusion projections than the "fields" proposal above. Additionally, projections are currently applied only at the end of a change stream pipeline, meaning that any fields that will be projected out later will be needlessly built up in the stages before and shipped through the pipeline.

      An open question for all of the above is also how to support downwards-compatibility to the existing system of flags. A new release that introduces a new way of selecting change stream events and fields must be compatible with the old system.
      The details of all this need to be worked out.

            Assignee:
            Sebastien Mendez
            Reporter:
            Jan Steemann
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: