[SERVER-25025] Improve startup time when there are tens of thousands of collections/indexes on WiredTiger Created: 13/Jul/16 Updated: 12/Nov/20 Resolved: 27/Sep/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, WiredTiger |
| Affects Version/s: | 3.2.6, 3.2.7, 3.2.8 |
| Fix Version/s: | 3.6.15, 4.2.1, 4.3.1, 4.0.14 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Chad Kreimendahl | Assignee: | Louis Williams |
| Resolution: | Done | Votes: | 10 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.2, v4.0, v3.6
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2019-09-23, Execution Team 2019-10-07 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 15 | ||||||||||||||||||||||||||||||||||||
| Description |
|
mongod opens every collection and index upon startup to clear plan cache and ensure settings are correct for the current version. When there are tens or hundreds of thousands of collections and index (1 file per collection/index) this can take minutes to complete, even on very fast storage. Consider an option to do this in parallel to improve startup time. Original Description |
| Comments |
| Comment by Githook User [ 22/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 22/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Louis Williams [ 15/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
mongo@phish.org we do not plan on backporting this to 3.4 due to the risk of introducing instability into that version. At this point in the 3.4 development cycle, only bug fixes are considered for backporting. Since this is only a performance improvement, we will not backport it further. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 15/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}Message: This makes two major improvements: (1) WiredTiger "metadata:create" cursors are very expensive; only use (2) No longer open a cursor for every WiredTiger collection at startup. (cherry picked from commit ba9670e4c6d3aa62db20501317457af99049570c) | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chad Kreimendahl [ 11/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
would love to see this make its way into 3.4 as well. looks very possible. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 10/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}Message: This makes two major improvements: (1) WiredTiger "metadata:create" cursors are very expensive; only use (2) No longer open a cursor for every WiredTiger collection at startup. (cherry picked from commit ba9670e4c6d3aa62db20501317457af99049570c) | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 09/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}Message: This makes two major improvements: (1) WiredTiger "metadata:create" cursors are very expensive; only use (2) No longer open a cursor for every WiredTiger collection at startup. (cherry picked from commit ba9670e4c6d3aa62db20501317457af99049570c) | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Louis Williams [ 27/Sep/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
This ticket addressed deficiencies in our startup process. The most noticeable improvement is that we will no longer open a cursor on every collection during startup. Storage engine startup time is still a function of the number of collections, however. This process is mostly CPU-bound, so I openedĀ | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 26/Sep/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}Message: This makes two major improvements: (1) WiredTiger "metadata:create" cursors are very expensive; only use (2) No longer open a cursor for every WiredTiger collection at startup. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 22/Feb/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
In a case where there were 80,000 tables we saw it taking about 30 seconds in this phase, in 3.2, 3.4, and 3.6:
At this point the files have already been opened (and that was relatively quick), and we are fetching a list of tables from WT. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 17/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
That increase in files would explain the startup time difference. As you know, we've made a bunch of changes to improve performance for users with a large number of collections/indexes (e.g., Dan | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chad Kreimendahl [ 17/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I won't likely be able to downgrade past 3.2.5. There are 213,000+ indexes... so quite a few more than collections. Given that each of those indexes gets its own file in WT, I'm curious how much that could come into play. When we were in 3.0, we had just under half these numbers of each. The load time then was < 30 seconds. Does the size of the files impact anything? One thing I notice is that the file loading process is very serial. Mongo uses only 1 core during the entire 3 minute load. During that time, it's not even remotely approaching any meaningful I/O usage. I'm curious how much of it could be parallelized / threaded? If it were a configuration option, that might make sense, because people without high-end [nvme / ssd] drives might not want to run this phase in high parallelism. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 17/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Chad.
Since there are about 14,307 collections, and at least that many indexes, opening up each of those tables is the issue. I gave you the wrong log verbosity level to see these. It should have been logLevel 2, but that is the issue. We'll check if that time can be brought down at all. You said it is a regression from 3.0, but I don't think this code path has changed. What times are you seeing on 3.0 for the same data on WiredTiger? | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chad Kreimendahl [ 17/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Upload complete. Looks like a 2 minute unexplained gap remains and a solid minute of clearing caches. If there's any additional restarts with other verbosity you need me to do to help catch the goings on between these two lines, let me know:
| ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 14/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
sallgeud, I've created a secure upload portal for you to use. Thanks again, | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chad Kreimendahl [ 14/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Sure. Do you have a secure place to upload those? | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 14/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi sallgeud, Thanks for the additional information about your dataset. So we can continue to investigate, would you please provide the startup logs at log level 1 for an affected and unaffected version of MongoDB? Thank you, | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chad Kreimendahl [ 13/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
. WiredTiger. We've been using it for 6+ months in production and since its release in Development.
| ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 13/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi sallgeud, Thank you for reporting this behavior. So we can better understand what is going on here, please provide additional information:
Thanks again, |