[SERVER-47855] Change default value for minSnapshotHistoryWindowInSeconds to 5 minutes Created: 30/Apr/20 Updated: 12/Jan/24 Resolved: 19/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.0-rc0, 5.1.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Lingzhi Deng | Assignee: | Monica Ng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v5.0
|
||||||||||||||||||||||||||||||||
| Sprint: | Repl 2020-05-18, Storage - Ra 2021-03-08, Storage - Ra 2021-03-22, Storage - Ra 2021-04-05, Storage - Ra 2021-04-19, Storage - Ra 2021-05-17, Storage - Ra 2021-05-31 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Linked BF Score: | 15 | ||||||||||||||||||||||||||||||||
| Story Points: | 0 | ||||||||||||||||||||||||||||||||
| Description |
|
To determine and recommend a good default for minSnapshotHistoryWindowInSeconds. We should also use this issue to understand the behavioral impacts on the system better when increasing this window and a potential max window. |
| Comments |
| Comment by Githook User [ 20/May/21 ] |
|
Author: {'name': 'Monica Ng', 'email': 'monica.ng@mongodb.com', 'username': 'mm-ng'}Message: (cherry picked from commit 6a1c644c1166906797f7728e16962cd90d5e14a7) |
| Comment by Jordi Serra Torrens [ 19/May/21 ] |
|
alexander.gorrod thanks for the heads up. A 5 minute window seems enough for us to close |
| Comment by Githook User [ 19/May/21 ] |
|
Author: {'name': 'Monica Ng', 'email': 'monica.ng@mongodb.com', 'username': 'mm-ng'}Message: |
| Comment by Monica Ng [ 18/May/21 ] |
|
PR to change the default value to 5 minutes: https://mongodbcr.appspot.com/769320002/ MongoDB Patch: https://spruce.mongodb.com/version/60a31123e3c331512efdea92/tasks Sys Perf Patch: https://spruce.mongodb.com/version/60a311819ccd4e7800aa3cdd/tasks |
| Comment by A. Jesse Jiryu Davis [ 17/May/21 ] |
|
I'm curious what the justification is for a 5-minute default. If there's little perf impact from increasing the window from 5 minutes to 30, why not make it 30? The advantage of a longer window is supporting longer-running snapshot reads, e.g. in big analytics jobs. |
| Comment by Alexander Gorrod [ 17/May/21 ] |
|
We have buy-in across the product and Atlas organizations that a 5 minute window is the right value. This ticket can proceed to code review now. kaloian.manassiev and jordi.serra-torrens please let us know if you need anything more to unblock |
| Comment by Alexander Gorrod [ 13/May/21 ] |
tl;drThe sys-perf workloads show very little throughput or latency cost when extending the window of history from 5 seconds to 30 minutes. The results from this analysis don't materially influence what the best window of history to choose for MongoDB users is. Especially with the currently considered default windows of between 5 and 10 minutes. Detailed AnalysisSupporting evidence from analyzing the performance regressions in our sys-perf automated performance testing workloads follows: There are very few throughput or latency differences between 60 and 1800 second windows of history. There are two variants of a Genny workload, which creates a collection, starts 100 threads each thread operating on a single document in the collection. Either alternatively inserting/removing the document or updating it. There is no obvious bottleneck or significant history storage requirement from those workload. I suspect there is contention on database resources across the 100 threads and durable history introduces a (relatively small) latency cost which is being exacerbated by the contentious workload. When running the Linkbench benchmark, a particular metric (add node) shows a ~20% performance regression for any window greater than 60 seconds. This cost can be explained by the additional work done in WiredTiger to store version information - the benchmark appears to be I/O bound, and adding in additional I/O reduces throughput. It's worth noting that this metric is one of many tracked by Linkbench - not all metrics experience regressions. There is a particular benchmark that repeatedly does tiny updates to a very large document. That benchmark experiences a 20% throughput regression when extending the window of history. We may in the future do work in WiredTiger to mitigate that cost (it's related to writing content back to disk, and the cost tradeoff between CPU overhead of reading/writing delta encoded updates vs the I/O overhead of reading/writing full values. Wrapping upOnly the Linkbench regression appears to be intrinsic to the storage of history (since history adds I/O needs to an I/O bound workload). It should be possible to close the performance gap for the other measured workloads with a standard analysis and optimization process if the access patterns tested are relevant to end users. |
| Comment by Alexander Gorrod [ 07/May/21 ] |
|
It's time for an update here - I have been digging into the performance regressions that have been captured. There are broadly three regressions captured by our performance testing. In short those are: Simple workloads that insert/remove or repeatedly update a small number of documents and are sensitive to the latency of individual operations can experience a increase in operation latency. That performance penalty seems to be further exacerbated when enabling higher levels of durability (write concerns of 2 and 3) - though it's not clear how/why that would be tied to durable history. Repeatedly updating one or a small number of large documents has a throughput regression of up to 20% with our recommended setting. Several of our performance tests report longer inter-test quiesce periods with extended history windows. I have not dug deeply into that behavior - our performance suite isn't designed to measure inter-test quiesce periods as a metric, so the comparison is unlikely to be fair. |
| Comment by Alexander Gorrod [ 30/Apr/21 ] |
|
We have been analysing the performance regressions experienced when configuring different default windows of history on our automated tests. I will add a summary of that analysis to this ticket early next week. We are in the process of choosing a default time, the results of the performance analysis will likely guide us to choose something between 5 and 10 minutes. |
| Comment by A. Jesse Jiryu Davis [ 22/Jan/21 ] |
|
| Comment by Daniel Pasette (Inactive) [ 22/Jan/21 ] |
|
I think jesse's point hereĀ about setting a constant amount of window is sound, though I'm still not quite clear what the user experience will feel like. These questions may already be answered but i didn't find them stated in the product description or initiative plan.
|
| Comment by Brian Lane [ 21/Sep/20 ] |
|
As discussed - going to park this in our backlog while we work on PM-1844 to see what improvements we may be able to get there first. |
| Comment by Tess Avitabile (Inactive) [ 20/Aug/20 ] |
|
Great, thanks, brian.lane! |
| Comment by Brian Lane [ 19/Aug/20 ] |
|
Hi tess.avitabile, Alex did ping me in the write-up. I will assign this issue to myself and will be chatting to evin.roesle about this. Thanks! |
| Comment by Tess Avitabile (Inactive) [ 19/Aug/20 ] |
|
alexander.gorrod, we discussed that we'd request brian.lane to lead the investigation on how to create user guidelines for setting the amount of history to store, as well as what the default should be. Does that still sound okay? Should I assign this ticket to Brian to track that work? |
| Comment by Alexander Gorrod [ 21/May/20 ] |
|
Thanks for further experimenting lingzhi.deng, and for the write up. I would like to follow up on this conversation in detail, but am busy right now with coordinating changes for the 4.4 release. My feeling here is that the performance changes you are seeing due to increasing the default time window are in line with what I would have expected. Requiring the storage engine to store 60 seconds of data will mean that it will need to write version information to data files (since it won't all fit in cache until that version information isn't relevant). On top of that, for update workloads, the storage engine will likely be saving multiple different versions of documents to data files as well. The goal of the durable history work in the last release was to make that cost reasonable when compared to the earlier cache overflow mechanism. I believe your results show that has been successful - most benchmark results are showing less than a 20% regression when requiring the additional history to be kept. We will hopefully be able to reduce that overhead as we spend time tuning the durable history mechanism after the 4.4 release is finalized, but there will be a cost with a lower bound that can be calculated in terms of additional disk space, I/O and CPU associated with keeping a longer window of history. |
| Comment by Alexander Gorrod [ 06/May/20 ] |
|
lingzhi.deng Thanks for putting the numbers into a digestible form. I took a look and the numbers aren't surprising to me on first look. They seem to vary between 80-90% of the prior performance when configuring a 60 second window. With an outlier at 60% and one at 100%. I think it would require more digging into the particular regression and the particular workload before determining exactly what is expected behavior. It's also probably worth waiting for some of the performance tickets that are currently in flight in WiredTiger before making a call. There is still some low hanging fruit in terms of getting better performance. |
| Comment by A. Jesse Jiryu Davis [ 05/May/20 ] |
|
Let's try hard to make the window a configurable constant instead of dynamically adjusted. In performance-sensitive applications it's better to be slow than unpredictable. (Fast and predictable is best, of course.) If a MongoDB deployment is running close to 100% capacity and a snapshot read causes its window to dynamically grow and its performance to decrease, that could cause an outage. I think customers would prefer to control the window size. |