[SERVER-46979] Improve changeStream performance relative to oplog queries Created: 19/Mar/20  Updated: 27/Oct/23  Resolved: 26/Mar/20

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: vinllen chen Assignee: Backlog - Triage Team
Resolution: Community Answered Votes: 0
Labels: change-streams-improvements
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Server Triage
Backwards Compatibility: Fully Compatible
Participants:

 Description   

I've made a test between reading oplog and change stream, it looks like the change stream performance is not as good as reading oplog directly. Here comes my experiment:

The fetcher program locates on hostB while the MongoDB locates on hostA, the ping delay from hostA to hostB is about 0.2~0.3ms. There're no cpu/memory/io/network bottleneck problem in my experiment.
I have 500w oplog with total size 5.5G on the source MongoDB:

  • Change stream:180 seconds. near 3w qps
  • oplog:60 seconds. about 8w+ qps

In the monitor of change stream fetching, the CPU runs about 60%. I think the gap is slightly bigger. Does this basically match your previous test results?
As I knew, for a replica set, the change stream will be split into two parts. The first is the $match oplog $cursor that can be seen on the aggregate explain command. The second part is transforming that will do some steps:

  • unmarshal oplog bson
  • allocate new memory and transform parsed oplog to change stream event
  • marshal change stream event into bson

So the main reason for the performance gap is the transforming step. Please let me known if I am wrong.

In my point of view, there is only 1 thread in oplog fetching and transforming, so increasing the threads can improve the performance. But it will be a tradeoff because it may affect the MongoDB server performance if too many threads are used to do the transform.



 Comments   
Comment by 高 凡 [ 11/Mar/21 ]

I'm having this problem as well; I'm listening on a large Mongo cluster, use the Change stream, It takes about 20s for 10000 terms, but about 100 million new data are being added every day, this is not acceptable.

Comment by Dmitry Agranat [ 26/Mar/20 ]

Hi cvinllen@gmail.com,

Glad to hear the reply was helpful.

Regards,
Dima

Comment by vinllen chen [ 26/Mar/20 ]

Hi Dima,

Thanks for your reply, it is helpful. You're right, I should compare the change stream listening to the newest events to tailing oplog instead of scanning oplog collection.
And your future plans about increasing the performance sound great, looking forward to that.

Comment by Dmitry Agranat [ 24/Mar/20 ]

Hi cvinllen@gmail.com,

Thank you for raising this ticket. Below, I've attempted to respond to your comments; please let me know if you have any additional questions.

Summary

It is true that $changeStream's raw read rate is (unavoidably) slower than a simple query on the oplog, that the $changeStream's oplog $match and transformation stages contribute most to this latency, and that CPU consumption is higher when running these stages as compared to a basic query. However, in our tests of oplog-tailing versus $changeStream under typical operating conditions and write workloads, performance of each has been similar up to the point where resource contention becomes an issue.

From the description of your experiment, it sounds as though you were not tailing the oplog, but instead started a change stream at the beginning of the oplog and compared it against a straightforward collection scan. The most important point to make is that this is not the use-case for which change streams are designed. Change streams are not intended to produce the same raw read rate as a basic query for a one-off scan through the entire contents of a large oplog; they are intended to operate towards the end of the oplog, scanning the entries added within the past few seconds, minutes, or hours. While it is sometimes unavoidable to scan through the oplog, we actively attempt to avoid doing so more than is strictly necessary; for instance, we implemented the Change Streams High Water Mark project specifically to avoid having to do this when resuming a stream.

We will continue to work on improving $changeStream, but the fact that there is a difference in performance between these use-cases is expected. To better reflect the issue you raised, we've changed the title of this ticket to "Improve $changeStream performance relative to oplog queries."

Regarding your comments and questions

As I knew, for a replica set, the change stream will be split into two parts. The first is the $match oplog $cursor that can be seen on the aggregate explain command. The second part is transforming that will do some steps:

This is broadly accurate, and for the purposes of this conversation these are certainly the heaviest stages in a simple change stream pipeline.

In the monitor of change stream fetching, the CPU runs about 60%. I think the gap is slightly bigger. Does this basically match your previous test results? So the main reason for the performance gap is the transforming step. Please let me known if I am wrong.

Based on our internal tests, change stream CPU consumption is indeed notably higher than a simple oplog query, both because of the complex $match filter that change streams must apply to the oplog, and especially because of the DocumentSourceChangeStreamTransform stage. The latter is responsible for most of the additional execution latency.

In my point of view, there is only 1 thread in oplog fetching and transforming, so increasing the threads can improve the performance. But it will be a tradeoff because it may affect the MongoDB server performance if too many threads are used to do the transform.

Parallel query/aggregation execution is indeed on our longer-term roadmap, but in the meantime, we would probably not consider making individual stages like DocumentSourceChangeStreamTransform internally multithreaded. Given that the $match and transformation stages are often very similar across change streams, we have also discussed the possibility of sharing the execution machinery between them where possible; but this is more about collapsing the work of multiple separate streams into a single source rather than multithreading each individual stream.

I hope the above information is useful, and many thanks again for bringing this issue to our attention!

Best regards,
Dima

Generated at Thu Feb 08 05:12:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.