[SERVER-18321] Speed up background index build with WiredTiger LSM Created: 05/May/15 Updated: 11/Jul/15 Resolved: 11/May/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.5, 3.1.3 |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Alexander Gorrod | Assignee: | Alexander Gorrod |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Backport Completed: | |||||||||
| Participants: | |||||||||
| Description |
|
I noticed that the repl13.js test is very slow with WiredTiger and LSM. While digging into the performance, I saw that the slow part of the test was building an index on the secondary. It appears as though each entry is inserted with a WiredTiger auto-commit transaction, each auto-commit transaction is doing a write and fsync to the WiredTiger WAL (journal). The index being built is 100,000 entries - so 100k fsync calls. A representative call stack is:
Ideally doing such an index build in WiredTiger would turn on bulk loading, or wrap groups of transactions into commits or in the least doing non-sync transactional operations. The obvious place to make that change is in mongo::MultiIndexBlock::insertAllDocumentsInCollection, but that's outside the WiredTiger storage engine implementation layer, so I'm hesitant to jump into the change without some discussion. There is already a call to WiredTigerRecoveryUnit::setRollbackWritesDisabled in IndexAccessMethod::commitBulk, but it isn't obvious how to use that information to help from the WiredTiger storage engine side. |
| Comments |
| Comment by Githook User [ 03/Jul/15 ] |
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: (cherry picked from commit 4d37a27896872dc5d280f5e85666e1d8431ec33b) |
| Comment by Githook User [ 08/May/15 ] |
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: Merge pull request #1948 from wiredtiger/lsm-bulk-load Add support for bulk load in LSM trees. |
| Comment by Githook User [ 08/May/15 ] |
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}Message: Add support for bulk load in LSM trees. This allows us to load into a single btree, using btree bulk load It's possible that we could avoid some of the switch logic when closing a Refs |
| Comment by Alexander Gorrod [ 06/May/15 ] |
|
We've realized that this performance problem is due to MongoDB using a bulk cursor for the load. In LSM the bulk flag is ignored to cursor open, thus we end up with auto commit transaction semantics. I'll think about whether it makes sense to alter LSM behavior for bulk cursors, or to make changes in the MongoDB layer for LSM indexes. |