Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48694

Push down user-defined stages in a change stream pipeline where possible

    XMLWordPrintable

    Details

    • Sprint:
      Query 2020-06-29

      Description

      Hi,

      Testing the change stream we observed data sent as such from mongoD to mongoS. (barring a constant 50% reduction that we have attributed to mongo transport compression). To further strengthen this analysis, we get exactly the same data transfer per entry during watch operations that use filters and the ones that do not.

      This leads us to to the conclusion that the projection and match stage is only happening at mongoS level leading to high network usage. 

      Also, the change stream reported on the mongoS log does not show the stages:

      2020-06-10T12:55:53.090-0400 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-5-0] Initiating asynchronous command: RemoteCommand 1614253 -- target:localhost:37018 db:percona cmd:{ aggregate: "vinnie", pipeline: [ { $changeStream: { fullDocument: "default", $_resumeAfterClusterTime: { ts: Timestamp(1591808149, 1) } } } ], fromMongos: true, needsMerge: true, cursor: { batchSize: 0 }, lsid: { id: UUID("3d242a83-9a0b-4db4-b425-c62659753ae8"), uid: BinData(0, 78E672A9F1DF895A5FBE23C35A0004466CCD167529F1405D95628DE7416FF5FA) } }
      

      If we compare with the mongoD:

      2020-06-10T12:58:16.834-0400 I COMMAND  [conn632] command percona.vinnie appName: "MongoDB Shell" command: getMore { getMore: 5980474694248498676, collection: "vinnie", lsid: { id: UUID("c48acddf-2ca9-495f-ade6-a8bebcfd3ffd") }, $clusterTime: { clusterTime: Timestamp(1591808293, 1), signature: { hash: BinData(0, 6F0DA9280F29644CC6B8A53F49251149171918E6), keyId: 6833738944455114778 } }, $db: "percona" } originatingCommand: { aggregate: "vinnie", pipeline: [ { $changeStream: { fullDocument: "default" } }, { $project: { updateDescription.updatedFields.OWNER_ID: 1.0, operationType: 1.0 } }, { $match: { operationType: "update" } } ], cursor: {}, lsid: { id: UUID("c48acddf-2ca9-495f-ade6-a8bebcfd3ffd") }, $clusterTime: { clusterTime: Timestamp(1591808285, 1), signature: { hash: BinData(0, CA55BFB9BD8CD11B917D276E553391EDA7C1B2B9), keyId: 6833738944455114778 } }, $db: "percona" } planSummary: COLLSCAN cursorid:5980474694248498676 keysExamined:0 docsExamined:0 numYields:2 nreturned:0 reslen:332 locks:{ Global: { acquireCount: { r: 8 } }, Database: { acquireCount: { r: 4 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_msg 0ms
      

      Where we can observe the $match and $project stages on mongoD. They are using the same cursor:

      watchCursor=db.getSiblingDB("percona").vinnie.watch( [ {$project:{"updateDescription.updatedFields.OWNER_ID" : 1, "operationType" : 1}}, { $match : {"operationType" : "update" } }, ], {"fullDocument": "default" }, {batchSize: 24}, {"maxAwaitTimeMS" : 50000} );
      

      Can you please clarify? It is not possible to extract a explain() from the change stream and documentation is not clear about it. Only with tests and code analysis is it possible to infer these results.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-query-execution Backlog - Query Execution
              Reporter:
              vgrippa@gmail.com VINICIUS GRIPPA
              Participants:
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated: