[SERVER-46979] Improve changeStream performance relative to oplog queries Created: 19/Mar/20 Updated: 27/Oct/23 Resolved: 26/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | vinllen chen | Assignee: | Backlog - Triage Team |
| Resolution: | Community Answered | Votes: | 0 |
| Labels: | change-streams-improvements | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Server Triage
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Participants: | |||||
| Description |
|
I've made a test between reading oplog and change stream, it looks like the change stream performance is not as good as reading oplog directly. Here comes my experiment: The fetcher program locates on hostB while the MongoDB locates on hostA, the ping delay from hostA to hostB is about 0.2~0.3ms. There're no cpu/memory/io/network bottleneck problem in my experiment.
In the monitor of change stream fetching, the CPU runs about 60%. I think the gap is slightly bigger. Does this basically match your previous test results?
So the main reason for the performance gap is the transforming step. Please let me known if I am wrong. In my point of view, there is only 1 thread in oplog fetching and transforming, so increasing the threads can improve the performance. But it will be a tradeoff because it may affect the MongoDB server performance if too many threads are used to do the transform. |
| Comments |
| Comment by 高 凡 [ 11/Mar/21 ] |
|
I'm having this problem as well; I'm listening on a large Mongo cluster, use the Change stream, It takes about 20s for 10000 terms, but about 100 million new data are being added every day, this is not acceptable. |
| Comment by Dmitry Agranat [ 26/Mar/20 ] |
|
Glad to hear the reply was helpful. Regards, |
| Comment by vinllen chen [ 26/Mar/20 ] |
|
Hi Dima, Thanks for your reply, it is helpful. You're right, I should compare the change stream listening to the newest events to tailing oplog instead of scanning oplog collection. |
| Comment by Dmitry Agranat [ 24/Mar/20 ] |
|
Thank you for raising this ticket. Below, I've attempted to respond to your comments; please let me know if you have any additional questions. SummaryIt is true that $changeStream's raw read rate is (unavoidably) slower than a simple query on the oplog, that the $changeStream's oplog $match and transformation stages contribute most to this latency, and that CPU consumption is higher when running these stages as compared to a basic query. However, in our tests of oplog-tailing versus $changeStream under typical operating conditions and write workloads, performance of each has been similar up to the point where resource contention becomes an issue. From the description of your experiment, it sounds as though you were not tailing the oplog, but instead started a change stream at the beginning of the oplog and compared it against a straightforward collection scan. The most important point to make is that this is not the use-case for which change streams are designed. Change streams are not intended to produce the same raw read rate as a basic query for a one-off scan through the entire contents of a large oplog; they are intended to operate towards the end of the oplog, scanning the entries added within the past few seconds, minutes, or hours. While it is sometimes unavoidable to scan through the oplog, we actively attempt to avoid doing so more than is strictly necessary; for instance, we implemented the Change Streams High Water Mark project specifically to avoid having to do this when resuming a stream. We will continue to work on improving $changeStream, but the fact that there is a difference in performance between these use-cases is expected. To better reflect the issue you raised, we've changed the title of this ticket to "Improve $changeStream performance relative to oplog queries." Regarding your comments and questions
This is broadly accurate, and for the purposes of this conversation these are certainly the heaviest stages in a simple change stream pipeline.
Based on our internal tests, change stream CPU consumption is indeed notably higher than a simple oplog query, both because of the complex $match filter that change streams must apply to the oplog, and especially because of the DocumentSourceChangeStreamTransform stage. The latter is responsible for most of the additional execution latency.
Parallel query/aggregation execution is indeed on our longer-term roadmap, but in the meantime, we would probably not consider making individual stages like DocumentSourceChangeStreamTransform internally multithreaded. Given that the $match and transformation stages are often very similar across change streams, we have also discussed the possibility of sharing the execution machinery between them where possible; but this is more about collapsing the work of multiple separate streams into a single source rather than multithreading each individual stream. I hope the above information is useful, and many thanks again for bringing this issue to our attention! Best regards, |