Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48694

Push down user-defined stages in a change stream pipeline where possible

    • Query Execution
    • Fully Compatible
    • Query 2020-06-29

      Hi,

      Testing the change stream we observed data sent as such from mongoD to mongoS. (barring a constant 50% reduction that we have attributed to mongo transport compression). To further strengthen this analysis, we get exactly the same data transfer per entry during watch operations that use filters and the ones that do not.

      This leads us to to the conclusion that the projection and match stage is only happening at mongoS level leading to high network usage. 

      Also, the change stream reported on the mongoS log does not show the stages:

      2020-06-10T12:55:53.090-0400 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-5-0] Initiating asynchronous command: RemoteCommand 1614253 -- target:localhost:37018 db:percona cmd:{ aggregate: "vinnie", pipeline: [ { $changeStream: { fullDocument: "default", $_resumeAfterClusterTime: { ts: Timestamp(1591808149, 1) } } } ], fromMongos: true, needsMerge: true, cursor: { batchSize: 0 }, lsid: { id: UUID("3d242a83-9a0b-4db4-b425-c62659753ae8"), uid: BinData(0, 78E672A9F1DF895A5FBE23C35A0004466CCD167529F1405D95628DE7416FF5FA) } }
      

      If we compare with the mongoD:

      2020-06-10T12:58:16.834-0400 I COMMAND  [conn632] command percona.vinnie appName: "MongoDB Shell" command: getMore { getMore: 5980474694248498676, collection: "vinnie", lsid: { id: UUID("c48acddf-2ca9-495f-ade6-a8bebcfd3ffd") }, $clusterTime: { clusterTime: Timestamp(1591808293, 1), signature: { hash: BinData(0, 6F0DA9280F29644CC6B8A53F49251149171918E6), keyId: 6833738944455114778 } }, $db: "percona" } originatingCommand: { aggregate: "vinnie", pipeline: [ { $changeStream: { fullDocument: "default" } }, { $project: { updateDescription.updatedFields.OWNER_ID: 1.0, operationType: 1.0 } }, { $match: { operationType: "update" } } ], cursor: {}, lsid: { id: UUID("c48acddf-2ca9-495f-ade6-a8bebcfd3ffd") }, $clusterTime: { clusterTime: Timestamp(1591808285, 1), signature: { hash: BinData(0, CA55BFB9BD8CD11B917D276E553391EDA7C1B2B9), keyId: 6833738944455114778 } }, $db: "percona" } planSummary: COLLSCAN cursorid:5980474694248498676 keysExamined:0 docsExamined:0 numYields:2 nreturned:0 reslen:332 locks:{ Global: { acquireCount: { r: 8 } }, Database: { acquireCount: { r: 4 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_msg 0ms
      

      Where we can observe the $match and $project stages on mongoD. They are using the same cursor:

      watchCursor=db.getSiblingDB("percona").vinnie.watch( [ {$project:{"updateDescription.updatedFields.OWNER_ID" : 1, "operationType" : 1}}, { $match : {"operationType" : "update" } }, ], {"fullDocument": "default" }, {batchSize: 24}, {"maxAwaitTimeMS" : 50000} );
      

      Can you please clarify? It is not possible to extract a explain() from the change stream and documentation is not clear about it. Only with tests and code analysis is it possible to infer these results.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            vgrippa@gmail.com Vinicius Grippa
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: