|
Ok I've done some comparisons! Some caveats:
- Results are based off a single run. They may not be reproducible and one or more test runs may have been run on a slower host. Take results with a grain of salt.
- I have not attempted to profile anything to figure out why the performance may be different.
- I do not know the variability expected in these workloads from run to run. On master we have observed pretty good stability, but don't have any such data on the older branches.
- The shard lite configuration does not usually run on the 3.6 branch - so had to be backported for this comparison.
Results below, the summary is that:
- For some reason the workload with a single $out document got slower on 4.0 and then faster again on master across most deployment environments. I don't know why the overhead would have been smaller in 3.6 and master, but after speaking with asya we don't particularly care much about this and don't plan to investigate.
- The workload with 10000 documents per $out got faster from 3.6 to 4.0 to master, with two exceptions: 1) on a standalone it got slower on 4.0 then rebounded on master and 2) on shard lite (2 shards, input collection sharded) it got slightly slower on master. No great theories for this except that we did add some known overhead in a sharded environment for master. Again here I discussed with asya and we decided it wasn't worth investigating.
Results are in documents per second, so "higher" (larger) is better/faster. "out_replaceCollection_X_to_temp_testing" means we ran an $out stage which processed X documents and inserted them all into the unsharded "temp_testing" collection with mode "replaceDocuments" (or the only mode on 3.6 and 4.0).

Based on these results we do not plan on investigating further at this time, so I'm resolving this ticket!
cc schwerin and pasette who might be interested in these highlights.
|