[SERVER-81260] Streams: Perf numbers for various workloads Created: 20/Sep/23 Updated: 19/Oct/23 Resolved: 19/Oct/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Aadesh Patel (Inactive) | Assignee: | Aadesh Patel (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | big, init-337-m3 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Atlas Streams
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Sprint 32, Sprint 33 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 135 | ||||||||||||
| Description |
|
https://mongodb.slack.com/archives/C04AH2TF7E1/p1695164357496619 Conversation above ^ – @ kenny.gorman re: throughput numbers for streams was chatting with Sandeep, so tomorrow im planning on getting a bunch of numbers together with various set ups. We have a few genny workloads setup for streams right now but those are using the in-memory source/sink operators so not super reflective of production setups. So plan is to run those same workloads against a kafka source in different regions with streams running in us-east-1 so that we have throughput numbers on a kafka source pipelines, and then send over a bunch of throughput numbers to you for each workload and source operator setuphows that generally sound? * in-memory source operator
will run that set up for every workload we have in genny ^ along with the avg document size (in bytes) that we're using for those workloads @ kenny.gorman each workload will be a diff type of stream pipeline and document size @aadesh we will want to change pipelines later and get these numbers a few times. So please do try to make the whole process repeatable. I think it makes since it'll be good to have some baseline idea of the impact of a bandwidth delay product with cross region I would make sure you really document how your Kafka setup is as well, Kafka tends to rely heavily on Linux page cache so there could be a difference between a Kafka which has buffered properly versus one that isn't because it's cold (edited) We also have some customers where reading change streams could potentially be cross region or merging to a cluster cross region (edited) Maybe I missed it but we need source and sink variations. Like Kafka to Kafka and Kafka to Mongo. To a lesser degree we need change stream source to Mongo. Yeah exact Kafka config is important. Repeatable is critical. Maybe something anyone can run not just engineering (thinking field) but maybe I am being too optimistic This is awesome guys. Can’t wait to see the results Btw sources/sinks will introduce variability in results for N different reasons (source is in different region, unique kind of Kafka deployment). It would require too much effort to try to cover all the different scenarios. I hope we can just test with only a couple of difference scenarios and use those as ballpark numbers. (edited) A couple different scenarios is what I meant yes, not all The main one is intra-region from kafka to mongodb, and intra region mongodb to mongodb. I am not sure (@ joe) if we have lots of Kafka to kafka use cases just yet. Joe Niemiec 12 days ago some rough telemetry based on data I have for combinations over 30 customers (a customer may do more then 1 pattern)CS 2 Kafka - 7 so really Kafka to Collection and CS to Collection are the top dogs perf thats super helpful re: repeatability, might take a bit more time on getting to system where we can easily repeat different setups for different stream pipelines need to make various changes to existing perf tooling infra (DSI specifically) to distinguish mongod vs mstreams, but looking into that more today so that we can get to a place where its easy to do all this |
| Comments |
| Comment by Githook User [ 11/Oct/23 ] |
|
Author: {'name': 'Aadesh', 'email': 'aadesh.patel@mongodb.com', 'username': 'Aadeshp'}Message: |
| Comment by Githook User [ 09/Oct/23 ] |
|
Author: {'name': 'Aadesh', 'email': 'aadesh.patel@mongodb.com', 'username': 'Aadeshp'}Message: |
| Comment by Githook User [ 06/Oct/23 ] |
|
Author: {'name': 'Aadesh', 'email': 'aadesh.patel@mongodb.com', 'username': 'Aadeshp'}Message: |
| Comment by Githook User [ 04/Oct/23 ] |
|
Author: {'name': 'Aadesh', 'email': 'aadesh.patel@mongodb.com', 'username': 'Aadeshp'}Message: |
| Comment by Githook User [ 28/Sep/23 ] |
|
Author: {'name': 'Aadesh Patel', 'email': 'aadesh.patel@mongodb.com', 'username': 'Aadeshp'}Message: |