[SERVER-27288] DocumentSourceSkip and DocumentSourceMatch getNext() should release GetNextResult reference prior to subsequent underlying source getNext() call Created: 05/Dec/16  Updated: 05/Apr/17  Resolved: 19/Dec/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.4.0
Fix Version/s: 3.4.2, 3.5.2

Type: Bug Priority: Major - P3
Reporter: James Wahlin Assignee: James Wahlin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-27444 Add unit tests confirming $unwind fol... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

The following will cause mongod to abort when this issue is present:

  1. Add invariant(false); at the start of the clonedStorage() method in pipeline/document.h

    DocumentStorage& clonedStorage() {
        invariant(false);
        reset(storagePtr()->clone());
        return const_cast<DocumentStorage&>(*storagePtr());
    }
    

  2. Run the following via mongo shell
    For DocumentSourceSkip:

    db.test.insert({x: [1,2,3]})
    db.test.aggregate([{$unwind: {path: "$x"}}, {$skip: 3}]);
    

    For DocumentSourceMatch:

    db.test.insert({x: [1,2,3]})
    db.test.aggregate([{$unwind: {path: "$x"}},{$match: {x: 2}}]);
    

Sprint: Query 2017-01-23
Participants:
Linked BF Score: 0

 Description   

Maintaining the reference can lead to unnecessary cloning of data when the underlying source is destructive, as is the case for DocumentSourceUnwind. This leads to a 15% performance degradation in the mongoperf Unwind test.



 Comments   
Comment by Githook User [ 09/Jan/17 ]

Author:

{u'username': u'jameswahlin', u'name': u'James Wahlin', u'email': u'james.wahlin@10gen.com'}

Message: SERVER-27288 Release Document ref in DocumentSourceSkip and DocumentSourceMatch getNext()

(cherry picked from commit a916d0398ac10489e9b382c6ad6f066999310254)
Branch: v3.4
https://github.com/mongodb/mongo/commit/5933eb4b7e35c06894f9245be5511f18ea48699d

Comment by Githook User [ 19/Dec/16 ]

Author:

{u'username': u'jameswahlin', u'name': u'James Wahlin', u'email': u'james.wahlin@10gen.com'}

Message: SERVER-27288 Release Document ref in DocumentSourceSkip and DocumentSourceMatch getNext()
Branch: master
https://github.com/mongodb/mongo/commit/a916d0398ac10489e9b382c6ad6f066999310254

Comment by James Wahlin [ 16/Dec/16 ]

SERVER-27444 entered to write unit tests that would flag the regression reported by this ticket.

Comment by James Wahlin [ 14/Dec/16 ]

The caveat posted above turns out not to be a real concern. The releaseDocument() call is guaranteed to use the move constructor for the Document returned.

Comment by James Wahlin [ 14/Dec/16 ]

I ran through all the document sources and Skip and Match were the only two I found to be obviously affected. There are other sources that loop on getNext() with a reference outside of the loop, but they call releaseDocument() prior to the next loop iteration, releasing the document from the GetNextResult. In many cases we are holding on to the document for a valid reason post-release (DocumentSourceSort is a good example of this), in which case the cloning behavior is correct.

There is a big caveat to my statement above however. The GetNextResult::releaseDocument() method returns a std::move() of the Document object held by the class. While this should perform a move there are no guarantees that it will, which would mean a copy and reference count bump. We may want to modify GetNextResult to hold a std::unique_ptr<Document> so that we can be sure the move is taking place. charlie.swanson - let's discuss further. If we want to pursue we can enter a new ticket to address.

Comment by Charlie Swanson [ 14/Dec/16 ]

Great catch! It seems like this idiom of

for (; nextInput.isAdvanced(); nextInput = pSource->getNext()) { ... }

is perhaps misguided. We use a similar strategy in many other stages, I think in $group, maybe $limit or $sort? Can you give a quick audit for things of this sort?

Comment by James Wahlin [ 14/Dec/16 ]

Looks like DocumentSourceMatch is exposed to the same issue.

DocumentSource::GetNextResult DocumentSourceMatch::getNext() {
    pExpCtx->checkForInterrupt();
 
    // The user facing error should have been generated earlier.
    massert(17309, "Should never call getNext on a $match stage with $text clause", !_isTextQuery);
 
    auto nextInput = pSource->getNext();
    for (; nextInput.isAdvanced(); nextInput = pSource->getNext()) {
        // MatchExpression only takes BSON documents, so we have to make one. As an optimization,
        // only serialize the fields we need to do the match.
        BSONObj toMatch = _dependencies.needWholeDocument
            ? nextInput.getDocument().toBson()
            : getObjectForMatch(nextInput.getDocument(), _dependencies.fields);
 
        if (_expression->matchesBSON(toMatch)) {
            return nextInput;
        }
    }
 
    return nextInput;
}

To reproduce:

  1. Add invariant(false); at the start of the clonedStorage() method in pipeline/document.h

    DocumentStorage& clonedStorage() {
        invariant(false);
        reset(storagePtr()->clone());
        return const_cast<DocumentStorage&>(*storagePtr());
    }
    

  2. Run the following via mongo shell

    db.test.insert({x: [1,2,3]})
    db.test.aggregate([{$unwind: {path: "$x"}},{$match: {x: 2}}]);
    

Comment by James Wahlin [ 05/Dec/16 ]

A fix for this issue should include addition of a unit test to catch any future regression.

Generated at Thu Feb 08 04:14:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.