I've made a test between reading oplog and change stream, it looks like the change stream performance is not as good as reading oplog directly. Here comes my experiment:
The fetcher program locates on hostB while the MongoDB locates on hostA, the ping delay from hostA to hostB is about 0.2~0.3ms. There're no cpu/memory/io/network bottleneck problem in my experiment.
I have 500w oplog with total size 5.5G on the source MongoDB:
- Change stream：180 seconds. near 3w qps
- oplog：60 seconds. about 8w+ qps
In the monitor of change stream fetching, the CPU runs about 60%. I think the gap is slightly bigger. Does this basically match your previous test results?
As I knew, for a replica set, the change stream will be split into two parts. The first is the $match oplog $cursor that can be seen on the aggregate explain command. The second part is transforming that will do some steps:
- unmarshal oplog bson
- allocate new memory and transform parsed oplog to change stream event
- marshal change stream event into bson
So the main reason for the performance gap is the transforming step. Please let me known if I am wrong.
In my point of view, there is only 1 thread in oplog fetching and transforming, so increasing the threads can improve the performance. But it will be a tradeoff because it may affect the MongoDB server performance if too many threads are used to do the transform.