[SERVER-18926] Full text search extremely slow and uses a lot of memory under WiredTiger Created: 11/Jun/15  Updated: 08/Oct/16  Resolved: 17/Jun/15

Status: Closed
Project: Core Server
Component/s: Text Search, WiredTiger
Affects Version/s: 3.0.1, 3.0.4
Fix Version/s: 3.0.5

Type: Bug Priority: Critical - P2
Reporter: Bruce Lucas (Inactive) Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File stacks.png    
Issue Links:
Depends
Related
related to SERVER-26534 Text search uses excessive memory Backlog
related to SERVER-19489 Assertion failure and segfault in Wor... Closed
is related to SERVER-18961 Avoid iterating the entire working se... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Quint Iteration 5
Participants:

 Description   
Issue Status as of Jul 14, 2015

ISSUE SUMMARY
Long-running queries in MongoDB periodically yield their locks. As part of the yield-preparation procedure, intermediate results buffered in memory by the storage engine may need to be processed in order to ensure that there are no references into the storage layer that may become invalid when locks are relinquished.

A bug in this procedure may make $text and geoNear (i.e. $near or $nearSphere) long-running queries, which buffer intermediate query results, execute slowly. In particular, if a such a query yields y times and buffers d documents, the overall time spent in yield-preparation was O(yd). With the fix, the time complexity is reduced to O(y).

This issue only appears when queries are performed against a mongod instance running with the WiredTiger storage engine. Instances running the MMAPv1 storage engine are not affected.

USER IMPACT
On MongoDB systems using the WiredTiger storage engine, queries using $text or geoNear (i.e. $near or $nearSphere) may perform poorly. The performance impact is most severe for "large" $text or geoNear queries, i.e. queries that return a lot of results or require examining a large number of index keys or documents.

AFFECTED VERSIONS
MongoDB 3.0.0 through 3.0.4 running with the WiredTiger storage engine.

FIX VERSION
The fix is included in the 3.0.5 production release.

Original description

Created db with 5M docs full-text indexed as follows:

words = [
    " when", " in", " the", " course", " of", " human", " events", " it", " becomes", " necessary",
    " four", " score", " and", " seven", " years", " ago",
    " ask", " not", " what", " your", " country", " can", " do", " for", " you",
    " that's", " one", " small", " step", " for", " a", " man",
]
 
function sentence() {
    l = Math.random() * 20
    s = " "
    for (var i=0; i<l; i++) {
        w = Math.floor(Math.random() * words.length)
        s = s + words[w]
    }
    return s
}
 
function init() {
    db.c.drop()
    db.c.createIndex({x: "text"})
}
 
function create() {
    count = 5000000
    every = 10000
    for (var i=0; i<count; ) {
        var bulk = db.c.initializeUnorderedBulkOp();
        for (var j=0; j<every; j++, i++)
            bulk.insert({x:sentence()})
        bulk.execute();
        print(i)
    }
}

Then do a full-text search:

db.c.find({$text: {$search: "necessary"}}).itcount()

Finishes in about 10 seconds under mmapv1, ran for a several minutes without finishing under WT before I killed it. It also uses (possibly a lot) more memory under WT than mmapv1.

It seems to be spending all its time in forceFetchAllLocs (itself, not callees), called from yield. This path is only exercised if storage engine has document-level locking, which explains why WT behaves differently than mmapv1.



 Comments   
Comment by Dave Withers [ 26/Jan/16 ]

will do, thanks.

Comment by J Rassi [ 26/Jan/16 ]

dwithers@spireon.com: could you please file a new ticket describing the issue you're encountering? Thanks.

Comment by Dave Withers [ 26/Jan/16 ]

did this issue regress back in mongo 3.0.8?

Comment by Githook User [ 17/Jun/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-18926 avoid iterating the entire working set when preparing for a WiredTiger snapshot change

Improves performance for query plans with a blocking stage when using the WiredTiger storage engine.
In particular, this should benefit full text search and geoNear queries.
Branch: v3.0
https://github.com/mongodb/mongo/commit/f3ca2d0ba8fa13959f5dc6b36805aa137c25089e

Generated at Thu Feb 08 03:49:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.