[SERVER-37085] Once new $out sys-perf workloads are finalized, compare performance across branches where possible Created: 12/Sep/18  Updated: 27/Nov/18  Resolved: 27/Nov/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Performance
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Charlie Swanson Assignee: Charlie Swanson
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 100000_docs_out_graph.png     PNG File 1000_docs_out_graph.png     PNG File 1_doc_out_graph.png    
Issue Links:
Duplicate
is duplicated by SERVER-37543 Gather baseline $out performance data... Closed
Related
related to SERVER-37670 [4.0] Switch $out writes to use write... Closed
Sprint: Query 2018-10-22, Query 2018-12-03
Participants:

 Description   

While adding benchmarks for the existing $out mode, we noted some places where the performance is measurably different between branches. This ticket will track the work to diagnose why the performance is different. Any regressions that are fixable, or improvements in more recent branches that are easily backportable should be tracked in separate tickets.



 Comments   
Comment by Charlie Swanson [ 27/Nov/18 ]

Ok I've done some comparisons! Some caveats:

  • Results are based off a single run. They may not be reproducible and one or more test runs may have been run on a slower host. Take results with a grain of salt.
  • I have not attempted to profile anything to figure out why the performance may be different.
  • I do not know the variability expected in these workloads from run to run. On master we have observed pretty good stability, but don't have any such data on the older branches.
  • The shard lite configuration does not usually run on the 3.6 branch - so had to be backported for this comparison.

Results below, the summary is that:

  • For some reason the workload with a single $out document got slower on 4.0 and then faster again on master across most deployment environments. I don't know why the overhead would have been smaller in 3.6 and master, but after speaking with asya we don't particularly care much about this and don't plan to investigate.
  • The workload with 10000 documents per $out got faster from 3.6 to 4.0 to master, with two exceptions: 1) on a standalone it got slower on 4.0 then rebounded on master and 2) on shard lite (2 shards, input collection sharded) it got slightly slower on master. No great theories for this except that we did add some known overhead in a sharded environment for master. Again here I discussed with asya and we decided it wasn't worth investigating.

Results are in documents per second, so "higher" (larger) is better/faster. "out_replaceCollection_X_to_temp_testing" means we ran an $out stage which processed X documents and inserted them all into the unsharded "temp_testing" collection with mode "replaceDocuments" (or the only mode on 3.6 and 4.0).



Based on these results we do not plan on investigating further at this time, so I'm resolving this ticket!

cc schwerin and pasette who might be interested in these highlights.

Generated at Thu Feb 08 04:44:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.