[SERVER-25025] Improve startup time when there are tens of thousands of collections/indexes on WiredTiger Created: 13/Jul/16  Updated: 12/Nov/20  Resolved: 27/Sep/19

Status: Closed
Project: Core Server
Component/s: Performance, WiredTiger
Affects Version/s: 3.2.6, 3.2.7, 3.2.8
Fix Version/s: 3.6.15, 4.2.1, 4.3.1, 4.0.14

Type: Improvement Priority: Major - P3
Reporter: Chad Kreimendahl Assignee: Louis Williams
Resolution: Done Votes: 10
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-31155 High memory & resources usage + very ... Closed
is duplicated by SERVER-40130 Improve multi-threading Closed
Related
related to SERVER-17078 show databases taking extraordinarily... Closed
related to SERVER-17675 Support file per database for WiredTiger Closed
is related to SERVER-52833 Capped collections can contain too ma... Closed
is related to SERVER-43664 Speedup WiredTiger storage engine sta... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2, v4.0, v3.6
Sprint: Execution Team 2019-09-23, Execution Team 2019-10-07
Participants:
Case:
Linked BF Score: 15

 Description   

mongod opens every collection and index upon startup to clear plan cache and ensure settings are correct for the current version. When there are tens or hundreds of thousands of collections and index (1 file per collection/index) this can take minutes to complete, even on very fast storage. Consider an option to do this in parallel to improve startup time.

Original Description
When starting the mongod service in a replicaset with ~50GB of data on extremely high-performance disk, it's taking approximately 2 minutes before the mongo database becomes available. This seems like a large regression from the 2.8 and 3.0 days.



 Comments   
Comment by Githook User [ 22/Oct/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-13082: 4.0.13 release - update changelog for SERVER-25025
Branch: v4.0
https://github.com/mongodb/docs/commit/b687b3d521af723e2a1301de8790fda2877a3c4f

Comment by Githook User [ 22/Oct/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-13082: 4.0.13 release - update changelog for SERVER-25025
Branch: master
https://github.com/mongodb/docs/commit/0761eb5a5a6dd3055fa9c65a39ad85c0fa69ea34

Comment by Louis Williams [ 15/Oct/19 ]

mongo@phish.org we do not plan on backporting this to 3.4 due to the risk of introducing instability into that version. At this point in the 3.4 development cycle, only bug fixes are considered for backporting. Since this is only a performance improvement, we will not backport it further.

Comment by Githook User [ 15/Oct/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-25025 Make startup faster with many WT collections

This makes two major improvements:

(1) WiredTiger "metadata:create" cursors are very expensive; only use
these when absolutely necessary, and use standard "metadata:" cursors
everywhere else. In 3.6, this lowers startup calls to "metadata:create"
from 3 to 0 per table.

(2) No longer open a cursor for every WiredTiger collection at startup.
Instead, lazily open cursors to initialize RecordIDs for inserts.

(cherry picked from commit ba9670e4c6d3aa62db20501317457af99049570c)
(cherry picked from commit 186d70b2c3043d0b1867c170ba7a6bc7eb25556f)
(cherry picked from commit 056a6188574e852528f71ce7ce27c5f5dc912aa9)
Branch: v3.6
https://github.com/mongodb/mongo/commit/d6e3ae804ce3e2aee2b91cce8de01f83cb3274ee

Comment by Chad Kreimendahl [ 11/Oct/19 ]

would love to see this make its way into 3.4 as well. looks very possible.

Comment by Githook User [ 10/Oct/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-25025 Make startup faster with many WT collections

This makes two major improvements:

(1) WiredTiger "metadata:create" cursors are very expensive; only use
these when absolutely necessary, and use standard "metadata:" cursors
everywhere else. This lowers startup calls to "metadata:create" from 3
to 1 per table.

(2) No longer open a cursor for every WiredTiger collection at startup.
Instead, lazily open cursors to initialize RecordIDs for inserts.

(cherry picked from commit ba9670e4c6d3aa62db20501317457af99049570c)
(cherry picked from commit 186d70b2c3043d0b1867c170ba7a6bc7eb25556f)
Branch: v4.0
https://github.com/mongodb/mongo/commit/056a6188574e852528f71ce7ce27c5f5dc912aa9

Comment by Githook User [ 09/Oct/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-25025 Make startup faster with many WT collections

This makes two major improvements:

(1) WiredTiger "metadata:create" cursors are very expensive; only use
these when absolutely necessary, and use standard "metadata:" cursors
everywhere else. This lowers startup calls to "metadata:create" from 3
to 1 per table.

(2) No longer open a cursor for every WiredTiger collection at startup.
Instead, lazily open cursors to initialize RecordIDs for inserts.

(cherry picked from commit ba9670e4c6d3aa62db20501317457af99049570c)
Branch: v4.2
https://github.com/mongodb/mongo/commit/186d70b2c3043d0b1867c170ba7a6bc7eb25556f

Comment by Louis Williams [ 27/Sep/19 ]

This ticket addressed deficiencies in our startup process. The most noticeable improvement is that we will no longer open a cursor on every collection during startup.

Storage engine startup time is still a function of the number of collections, however. This process is mostly CPU-bound, so I openedĀ SERVER-43664 to consider parallelizing this process for further speedups.

Comment by Githook User [ 26/Sep/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-25025 Make startup faster with many WT collections

This makes two major improvements:

(1) WiredTiger "metadata:create" cursors are very expensive; only use
these when absolutely necessary, and use standard "metadata:" cursors
everywhere else. This lowers startup calls to "metadata:create" from 3
to 1 per table.

(2) No longer open a cursor for every WiredTiger collection at startup.
Instead, lazily open cursors to initialize RecordIDs for inserts.
Branch: master
https://github.com/mongodb/mongo/commit/ba9670e4c6d3aa62db20501317457af99049570c

Comment by Bruce Lucas (Inactive) [ 22/Feb/18 ]

In a case where there were 80,000 tables we saw it taking about 30 seconds in this phase, in 3.2, 3.4, and 3.6:

Thread 1 (Thread 0x7fbb7ef9fcc0 (LWP 9548)):
#0  0x0000000001a31120 in __config_next ()
#1  0x0000000001a317e6 in __config_getraw.isra.0 ()
#2  0x0000000001a31a97 in __wt_config_get ()
#3  0x0000000001a337ac in __wt_config_collapse ()
#4  0x0000000001a56ad1 in __schema_create_collapse.isra.0 ()
#5  0x0000000001a57031 in __curmetadata_setkv ()
#6  0x0000000001a57c33 in __curmetadata_next ()
#7  0x00000000010afe96 in mongo::WiredTigerKVEngine::getAllIdents(mongo::OperationContext*) const ()
#8  0x00000000010146e0 in mongo::KVStorageEngine::KVStorageEngine(mongo::KVEngine*, mongo::KVStorageEngineOptions const&) ()
#9  0x00000000010ab0f8 in mongo::(anonymous namespace)::WiredTigerFactory::create(mongo::StorageGlobalParams const&, mongo::StorageEngineLockFile const&) const ()
#10 0x0000000000fd3a4e in mongo::ServiceContextMongoD::initializeGlobalStorageEngine() ()
#11 0x00000000009c3c30 in mongo::(anonymous namespace)::_initAndListen(int) ()
#12 0x00000000009797ad in main ()

At this point the files have already been opened (and that was relatively quick), and we are fetching a list of tables from WT.

Comment by Daniel Pasette (Inactive) [ 17/Jul/16 ]

That increase in files would explain the startup time difference. As you know, we've made a bunch of changes to improve performance for users with a large number of collections/indexes (e.g., SERVER-17078), but obviously there is room for improvement. I'm going to morph this ticket into an improvement request, though it's unlikely to make the v3.4 cutoff as active development for that release is coming to a close shortly. Thanks again for your report and collaboration.

Dan

Comment by Chad Kreimendahl [ 17/Jul/16 ]

I won't likely be able to downgrade past 3.2.5. There are 213,000+ indexes... so quite a few more than collections. Given that each of those indexes gets its own file in WT, I'm curious how much that could come into play. When we were in 3.0, we had just under half these numbers of each. The load time then was < 30 seconds. Does the size of the files impact anything?

One thing I notice is that the file loading process is very serial. Mongo uses only 1 core during the entire 3 minute load. During that time, it's not even remotely approaching any meaningful I/O usage. I'm curious how much of it could be parallelized / threaded?

If it were a configuration option, that might make sense, because people without high-end [nvme / ssd] drives might not want to run this phase in high parallelism.

Comment by Daniel Pasette (Inactive) [ 17/Jul/16 ]

Thanks Chad.

21:51:12.729 - Start
21:51:15.905 - 3 seconds starting up WiredTiger
21:51:32.566 - 17 seconds fetching and placing oplog markers 
==> 21:53:22.795 - 110 seconds to get to "flushing directory /mongodb/data/wt"; this time is spent doing a pre-start check of every wiredtiger table and index.
21:54:16.297 - 54 seconds checking every collection and index at startup
21:54:17.095 - < 1 second starting up repl workers and transitioning to secondary and accepting first connection
 
Total time: 3 minutes 5 seconds

Since there are about 14,307 collections, and at least that many indexes, opening up each of those tables is the issue. I gave you the wrong log verbosity level to see these. It should have been logLevel 2, but that is the issue.

We'll check if that time can be brought down at all. You said it is a regression from 3.0, but I don't think this code path has changed. What times are you seeing on 3.0 for the same data on WiredTiger?

Comment by Chad Kreimendahl [ 17/Jul/16 ]

Upload complete. Looks like a 2 minute unexplained gap remains and a solid minute of clearing caches. If there's any additional restarts with other verbosity you need me to do to help catch the goings on between these two lines, let me know:

2016-07-16T21:51:32.566-0500 I STORAGE  [initandlisten] Placing a marker at optime Jul 16 17:14:26:18
2016-07-16T21:53:22.795-0500 D STORAGE  [initandlisten] flushing directory /database/data

Comment by Kelsey Schubert [ 14/Jul/16 ]

sallgeud, I've created a secure upload portal for you to use.

Thanks again,
Thomas

Comment by Chad Kreimendahl [ 14/Jul/16 ]

Sure. Do you have a secure place to upload those?

Comment by Kelsey Schubert [ 14/Jul/16 ]

Hi sallgeud,

Thanks for the additional information about your dataset. So we can continue to investigate, would you please provide the startup logs at log level 1 for an affected and unaffected version of MongoDB?

Thank you,
Thomas

Comment by Chad Kreimendahl [ 13/Jul/16 ]

. WiredTiger. We've been using it for 6+ months in production and since its release in Development.
. Approximately 90 databases. 14,255 collections. 211,606 indexes.

systemLog:
   destination: file
   logAppend: true
   verbosity: 0
   path: /basePath/log/mongodb3.2.log
processManagement:
   fork: false
   pidFilePath: /basePath/mongod.pid
net:
   bindIp: [5.5.5.11,127.0.0.1]
   port: 27017
   maxIncomingConnections: 51200
   wireObjectCheck: true
   ipv6: false
   unixDomainSocket:
      enabled: false
   http:
      enabled: false
      JSONPEnabled: false
      RESTInterfaceEnabled: false
storage:
   dbPath: /basePath/data/wt
   indexBuildRetry: true
   repairPath: /basePath/data/wt/_repair
   journal:
      enabled: true
      commitIntervalMs: 100
   directoryPerDB: true
   engine: wiredTiger
   wiredTiger:
      engineConfig:
         cacheSizeGB: 32
         statisticsLogDelaySecs: 0
         journalCompressor: snappy
         directoryForIndexes: true
      collectionConfig:
         blockCompressor: snappy
      indexConfig:
         prefixCompression: true
setParameter:
   internalQueryExecMaxBlockingSortBytes: 67108864
operationProfiling:
   slowOpThresholdMs: 100
   mode: off
replication:
   oplogSizeMB: 20480
   replSetName: SomeREPLName01
   enableMajorityReadConcern: false

Comment by Kelsey Schubert [ 13/Jul/16 ]

Hi sallgeud,

Thank you for reporting this behavior. So we can better understand what is going on here, please provide additional information:

  • Please clarify which storage engine you are using when you observed this regression (WiredTiger, MMAPv1). Was the same storage engine used when MongoDB was upgraded?
  • Please provide the approximate number of databases and collections present on your server.

Thanks again,
Thomas

Generated at Thu Feb 08 04:08:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.