[SERVER-43664] Speedup WiredTiger storage engine startup for many tables by optimizing WiredTigerUtil::setTableLogging() Created: 26/Sep/19  Updated: 29/Oct/23  Resolved: 03/Sep/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.2, 4.2.11

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File after.png     PNG File before.png    
Issue Links:
Backports
Depends
Problem/Incident
Related
related to SERVER-25025 Improve startup time when there are t... Closed
is related to SERVER-54074 [v4.0] Log start and end of changes t... Closed
is related to SERVER-55479 Invariant that the first table checke... Closed
is related to WT-5394 fast way to fetch logging config for ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4, v4.2
Sprint: Execution Team 2019-12-30, Execution Team 2020-09-07
Participants:
Case:
Linked BF Score: 43

 Description   

SERVER-25025 targets inefficiencies in startup initialization in MongoDB's use of WiredTiger.

Even with these optimizations, given a large number of collections, we could still benefit from parallelizing the startup procedure. It is currently single-threaded and mostly CPU-bound, spending almost all CPU time-per-collection in setTableLogging(). This uses an expensive "metadata:create" cursor to determine whether or not a table is logged.



 Comments   
Comment by Githook User [ 27/Oct/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-43664 Speedup WiredTiger storage engine startup for many tables by optimizing WiredTigerUtil::setTableLogging()

(cherry picked from commit 8e64b07e3bb363347ee2c11a56aba873365ed74a)
Branch: v4.2
https://github.com/mongodb/mongo/commit/3a55bbd37a050841bc2791b13489dd66c9bc7c67

Comment by Githook User [ 12/Oct/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: Revert "SERVER-43664 Speedup WiredTiger storage engine startup for many tables by optimizing WiredTigerUtil::setTableLogging()"

This reverts commit 97fdfbefdc6841d0b07d0bf54a28c86c70ca5e19.
Branch: v4.2
https://github.com/mongodb/mongo/commit/5454f3bf6391624e42efbc2538536cb0e8bdaab2

Comment by Githook User [ 03/Oct/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-43664 Speedup WiredTiger storage engine startup for many tables by optimizing WiredTigerUtil::setTableLogging()

(cherry picked from commit 8e64b07e3bb363347ee2c11a56aba873365ed74a)
Branch: v4.2
https://github.com/mongodb/mongo/commit/97fdfbefdc6841d0b07d0bf54a28c86c70ca5e19

Comment by Githook User [ 03/Oct/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-43664 Speedup WiredTiger storage engine startup for many tables by optimizing WiredTigerUtil::setTableLogging()

(cherry picked from commit 8e64b07e3bb363347ee2c11a56aba873365ed74a)
Branch: v4.4
https://github.com/mongodb/mongo/commit/66999d4a0c312052f2987f7f283155d468a995f5

Comment by Alexander Gorrod [ 07/Sep/20 ]

Thanks gregory.wlodarek - that is a great improvement!

Comment by Githook User [ 03/Sep/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-43664 Fix file name lookup in wt_table_checks*.js for Windows
Branch: master
https://github.com/mongodb/mongo/commit/4f430e6e495c139a2fa4511d18b35dbf6ccaca80

Comment by Githook User [ 03/Sep/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-43664 Speedup WiredTiger storage engine startup for many tables by optimizing WiredTigerUtil::setTableLogging()
Branch: master
https://github.com/mongodb/mongo/commit/8e64b07e3bb363347ee2c11a56aba873365ed74a

Comment by Gregory Wlodarek [ 02/Sep/20 ]

The change for this is now in the commit queue, so I thought I'd follow up with a summary of the results.

I ran some basic performance results on the time spent starting up and shutting down mongod averaged out over ten executions.

On master:

# Tables Time Spent
1000 1.3013s
2000 1.5038s
5000 1.7056s
10000 2.1347s
20000 3.2862s

On v4.4:

# Tables Time Spent
1000 1.8004s
2000 1.9329s
5000 3.4488s
10000 5.0004s
20000 8.7877s

Difference:

# Tables Speedup
1000 1.38x
2000 1.28x
5000 2.02x
10000 2.34x
20000 2.67x
Comment by Alexander Gorrod [ 05/Jan/20 ]

milkie that sounds like a good idea to me. If we make a change here I think we should also check to make sure that opening an old database with a new version gets a useful error message, and doesn't either try to proceed without updating the logging settings or (worse) corrupting the database. I remember that it was a bit of a journey to get to this solution with upgrade/downgrade considerations (I think SERVER-37483 is the most recent relevant ticket to that).

Comment by Eric Milkie [ 30/Dec/19 ]

After examining this behavior, I think we should work on removing the blanket check for table logging instead of trying to speed it up. I believe we only need to go through all the tables when making an actual change to logging, and we can simply assume the logging is set correctly otherwise. The logging setting changes when switching between replica set and standalone, and when starting a replica set member in standalone maintenance mode.

Generated at Thu Feb 08 05:03:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.