[SERVER-32946] Multiple change streams cause severe performance drop Created: 28/Jan/18 Updated: 27/Oct/23 Resolved: 30/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 3.6.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gabor Gyebnar | Assignee: | Bruce Lucas (Inactive) |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Opening multiple (10+) change stream cursors causes massive delays (up to several minutes) between database writes and notification arrival. A single change stream (or 2-3 of them) does not produce the same issue. In a synthetic test, I wrote 100 small documents per second into a database and listen to changes using change streams. I opened 50 change streams and ran it for 100 seconds. The average delay between DB write and change event arrival was 7.1 seconds; the largest delay was 205 seconds (not a typo, over three minutes). MongoDB version: 3.6.2 I used a Node.js client, CPU and memory usage was minimal.
and
Both had the same effect. |
| Comments |
| Comment by Shane Harvey [ 12/Jul/23 ] | |||||||||
|
Note that there is an application-side workaround which allows many change streams to share a single connection. I've described the PyMongo workaround in this comment on SERVER-42885. Similar patterns should be possible in other drivers as well. | |||||||||
| Comment by Yatinkumar Patel [ 09/Aug/21 ] | |||||||||
|
Hello Guys, We are using v4.2.8 with a singleton connection MongoDB. We are experiencing that multiple change streams cause severe performance drop. All change streams request scanning all collections. do you guys know what would be the reasons?
| |||||||||
| Comment by Bruce Lucas (Inactive) [ 31/Jan/18 ] | |||||||||
|
Hi Gabor, The client issues a getmore operation on the change stream cursor to obtain each notification, so each change stream that is waiting for its next notification will have an outstanding getmore operation in progress until the next notification becomes available. Each connection can only service one operation at a time, so the number of connections needed will be as large as the number of change streams waiting for their next notification. I opened Bruce | |||||||||
| Comment by Gabor Gyebnar [ 30/Jan/18 ] | |||||||||
|
Thank you, Bruce! To be honest I don't quite understand why each change stream opens a new connection since the documentation is rather tight-lipped about when connections are opened. My expectation was that I can open tens of thousands of cursors and get Firebase-like real-time notifications only on the appropriate channels. I probably misunderstood the use case; I guess I'll need to listen to all changes in a collection and filter them manually instead. | |||||||||
| Comment by Bruce Lucas (Inactive) [ 30/Jan/18 ] | |||||||||
|
Hi Gabor, Thanks for the data and for the repro code. I was able to reproduce the high notification latencies at low operation rates that you saw. I traced the issue to the poolSize setting. The default poolSize in the NodeJS driver is small, which limits the number of concurrent connections. Each change stream will tie up a connection doing a getmore operation on the change stream for the period of time that it waits for the next event; since your test sends events at a very low rate on each change stream, this means that all connections could be tied up for some time, and meanwhile change streams that in fact have results available at the server are not able to get a connection in the client to see those results, resulting in large notification latencies. To avoid this you should ensure that the poolSize is at least as large as the number of change streams, for example
With this change I got good latencies in a 60-second run:
In the process of investigating this I did observer a bottleneck in the server at much higher operation rates that I want to look into further (and may open a ticket for), but that bottleneck was not in play in your test at low opeation rates, so I'll close this ticket. Bruce | |||||||||
| Comment by Gabor Gyebnar [ 30/Jan/18 ] | |||||||||
|
Hi Bruce, Test setup #2 was a single standalone mongod process, not a replica set. This is how I created it: $ docker run --restart always -d --name mongo -p 27017:27017 mongo --replSet="rs0" And then rs.initiate() in mongo console. Here's a copy of the /data/db directory after the test was run: I uploaded the test code with instructions to GitHub: | |||||||||
| Comment by Bruce Lucas (Inactive) [ 29/Jan/18 ] | |||||||||
|
Hi Gabor, Can you clarify your test setup #2 - is this a single standalone mongod process, not a replica set? Can you please upload the mongod log file and archived content of $dbpath/diagnostic.data for one of the tests? If setup #2 is indeed not a replica set this will be simpler to analyze. Also please tell us the exact command used and timeline (including timezone) for the test that you upload. Thanks, | |||||||||
| Comment by Bernard Gorman [ 29/Jan/18 ] | |||||||||
|
schwerin, I do see an increase in latency as additional $changeStreams are opened, but not until we reach considerably higher levels of write throughput than the case described here. For instance, for a 90-second test with 15 $changeStreams open on a collection and 4 threads running a mixed ~7K ops/s write workload, average and max latency remains negligible in my tests. By the time we reach 105 parallel $changeStreams with the same workload, I'm seeing latencies more in line with the figures given above. | |||||||||
| Comment by Andy Schwerin [ 29/Jan/18 ] | |||||||||
|
bernard.gorman, is this consistent with the data you're seeing? | |||||||||
| Comment by Gabor Gyebnar [ 28/Jan/18 ] | |||||||||
|
Sorry for the code formatting issues. I can't edit my submission now. I'll try again:
and
|