[SERVER-40734] Performance test verify BPCPS Created: 19/Apr/19  Updated: 14/Jun/19  Resolved: 10/Jun/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Benjamin Caimano (Inactive) Assignee: Benjamin Caimano (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive bpcps-benchmark.zip     PNG File steady-plot-2.png     PNG File step-down-plot-with-primary.png     PNG File step-down-plot.png    
Issue Links:
Backports
Backport Requested:
v4.2
Sprint: Service Arch 2019-05-06, Service Arch 2019-05-20, Service Arch 2019-06-03, Service Arch 2019-06-17
Participants:

 Description   

No goal towards commits, but a repeatable script is probably due to be attached here.



 Comments   
Comment by Benjamin Caimano (Inactive) [ 12/Jun/19 ]

Some explanation is probably due since I hear these graphs might see wider consumption. The mongodes and mongoses involved had a 1s sleep introduced at the start of each client thread. This artificially simulates a predictably slow connection. The c driver program in bpcps-benchmark.zip starts 100 threads each with a single mongoc client connection. Each thread sends out a find with 1s javascript sleep in its $where.

This means that the initial connection+find to the mongos takes at least 3s and reconnecting to a mongod+find takes at least 2s. Without any new connections being made, an operation takes at least 1s--which is the steady state seen in steady-plot2.png. Each step you see in the graphs is one less connection needing to be made somewhere along the CRUD-associated path. Both axis of the charts are in ms. The X axis is the duration from when the test began to the start of a find operation from the client. The Y axis is the latency of that same operation. Blue is max, green is median, yellow is min. Honestly, I suspect that just dots would have painted the picture about as well.

The clear signal these graphs show is due to two conditions:

  • The operations need to take long enough that more connections are needed. If operations are quick, then you won't need many active connections.
  • The actual time to establish a connection needs to be substantial with respect to the time to perform the operation. If this isn't true, than variance around the operation can make it hard to see the cost of connections.
Comment by Benjamin Caimano (Inactive) [ 10/Jun/19 ]

I've got some pretty encouraging graphs here. With a high connection cost, I was able to make the latency from new connections substantially less severe for CRUD operations. I've attached my (unimpressive) find-driver and scipy script in the zip file.

Comment by Benjamin Caimano (Inactive) [ 30/May/19 ]

I was able to prove this out on the existing in flight stuff, not closing until I can do so on approved code.

Generated at Thu Feb 08 04:55:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.