[SERVER-22793] Unbounded memory usage by long-running query using projection Created: 22/Feb/16 Updated: 17/Nov/16 Resolved: 24/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.0.9, 3.2.3, 3.3.2 |
| Fix Version/s: | 3.0.10, 3.2.4, 3.3.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | WTplaybook, code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Completed: | |||||
| Sprint: | Query 11 (03/14/16) | ||||
| Participants: | |||||
| Description |
Memory behavior as follows:
|
| Comments |
| Comment by Githook User [ 26/Feb/16 ] | ||||||||||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: This ensures that the set of WorkingSetIDs does not grow without | ||||||||||||||||
| Comment by Githook User [ 25/Feb/16 ] | ||||||||||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: This ensures that the set of WorkingSetIDs does not grow without bound (cherry picked from commit aaa5074a59327cdd2d0a462bb27c98f1c1c3ec6a) | ||||||||||||||||
| Comment by Githook User [ 24/Feb/16 ] | ||||||||||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: This ensures that the set of WorkingSetIDs does not grow without bound | ||||||||||||||||
| Comment by David Storch [ 24/Feb/16 ] | ||||||||||||||||
|
After further investigation, it appears that this issue also affects servers using the MMAP storage engine on the master and 3.2 branches. Servers running 3.2.x or 3.3.x versions of WiredTiger are not affected. On the 3.0 branch, both MMAP and WiredTiger are affected. However, only queries which use covered execution plans are affected on 3.0.x WiredTiger. Given this new information, the plan is to use our normal backport workflow. Specifically,
| ||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 23/Feb/16 ] | ||||||||||||||||
|
Yes, that fixes it, thanks. | ||||||||||||||||
| Comment by David Storch [ 23/Feb/16 ] | ||||||||||||||||
|
Thanks bruce.lucas. After some more code reading I think I found the likely cause of the problem. Would you be able to try applying the following patch to the v3.0 branch and see if it fixes the problem?
If I'm right, the problem isn't a WorkingSetMember leak. Instead, it's some logic that buffers WorkingSetMember ids. In preparation for yield (when the storage engine supports document-granularity concurrency), we traverse this set of ids, do a bit of necessary prep work, and then clear the set. We skip this work for covered plans since it is not necessary for correctness---but it is necessary for keeping our memory footprint in check! | ||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 23/Feb/16 ] | ||||||||||||||||
|
It is using a $snapshot query. With that information I was able to reproduce the problem with the following shell queries:
Problem does not reproduce with the following:
So it seems that both index scan and projection must be present. | ||||||||||||||||
| Comment by David Storch [ 22/Feb/16 ] | ||||||||||||||||
|
bruce.lucas, it is possible that the vector holding the WorkingSetMembers is growing without bound if there is some code path which leaks a WorkingSetMember. I did a little bit of code inspection on HEAD of the 3.0 branch to see if I could spot such a leak, but didn't find one. What is the actual operation that mongoexport is running against the server? | ||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 22/Feb/16 ] | ||||||||||||||||
|
Does not reproduce without the project stage, e.g. mongoexport without -f. | ||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 22/Feb/16 ] | ||||||||||||||||
|
Could not reproduce this on 3.2.3. | ||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 22/Feb/16 ] | ||||||||||||||||
|
A customer reported an OOM error because of this, with the following stack trace:
The vector being added to in IndexScan::work, called from ProjectionScan::work, seems a likely culprit. |