[SERVER-36346] Large memory consumption per pending getmore Created: 30/Jul/18 Updated: 29/Oct/23 Resolved: 26/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.2.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Mohammad Dashti (Inactive) |
| Resolution: | Fixed | Votes: | 4 |
| Labels: | query-director-triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Query 2020-03-09, Query 2020-03-23, Query 2020-04-06, Query 2020-04-20, Query 2020-05-04, Query 2020-05-18, Query 2020-06-01, QE 2021-09-20, QE 2021-10-04, QE 2021-10-18, QE 2021-11-01 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Linked BF Score: | 135 | ||||||||||||||||
| Description |
|
For getmores we allocate a 32 MB buffer for the response up front, which remains allocated while the getmore is pending. (This is the 16 MB bson maximum document size, plus some amount for the response message overhead, rounded up to 32 MB by the buffer allocator). This can add up to a lot of memory overhead if there are a large number of pending getmores, for example if you have a lot of oplog tails or change streams. Can this be improved? For example maybe we don't need to allocate for the worst case up front? Even the simple expedient of teaching the buffer allocator that it doesn't always need to stick to powers of two would improve this by a factor of two. |
| Comments |
| Comment by Mohammad Dashti (Inactive) [ 05/Nov/21 ] | |||||||
800 * 32MB = 25.6 GB (out of the reported 27 GB) should've been related to the allocated buffer for pending getMores on tailable cursors. These buffers were likely retained for an extensive period of time waiting for a change to happen and reporting them back to the user. Now, with the change in this ticket (and all related tickets), we take a maximum of 16MB buffers for a very limited amount of time (i.e., after having at least a single record to return until returning it back to the user. It's usually sub-second). Basically, for N pending getMore requests on a tailable cursor, we were allocating N*32MB for the whole period until a change occurs on that collection. Then, assuming that we're using a framework like Meteor, it'll constantly keep these N cursors, causing a constant N*32MB reserved. With the change in this ticket, now almost nothing will be reserved. | |||||||
| Comment by Mohammad Dashti (Inactive) [ 26/Oct/21 ] | |||||||
|
Author: {'name': 'Mohammad Dashti', 'email': 'mdashti@gmail.com', 'username': 'mdashti'}Message: | |||||||
| Comment by Bruce Lucas (Inactive) [ 18/May/20 ] | |||||||
|
Tip: it's much easier to make comparisons if the two runs are plotted on the same charts. If they are separated by a subsantial gap in time you can use the timeline "condensed" option to remove the gap so they both occupy about half of the width. Left is new, without prealloc; right is old, with prealloc
Generally speaking, this is a very light load with very little concurrency. A heavier load with more concurrency may or may not look different.
| |||||||
| Comment by Mathias Stearn [ 17/Apr/20 ] | |||||||
|
In addition to tailable, we should also consider whether there is any batchSize restriction in the getmore request. It is also wasteful to allocate the giant buffer when the client is asking for only a few documents at a time. Additionally, we should probably add a special case to https://github.com/mongodb/mongo/blob/05eb5a55e9a7f50db04c0a7a12e6cd06248bbab5/src/mongo/bson/util/builder.h#L404-L416, so that whenever we would allocate 16MB, we allocate 16MB + ~64KB instead. The 0.5% overallocation will frequently avoid the 99.5% overallocation we do today, which seems like a good tradeoff. Also, while we are in there, it is probably worth downsizing the allocation size by sizeof(SharedBuffer::Holder) (as long as it is still > minSize) to counteract the extra space requested at https://github.com/mongodb/mongo/blob/master/src/mongo/util/shared_buffer.h#L70, which is probably bumping us up to a higher TCMalloc size class. | |||||||
| Comment by Charlie Swanson [ 13/Aug/18 ] | |||||||
|
david.storch I'd actually like to nominate this ticket for quick wins. After investigating a little more, it looks like this should be pretty doable. It looks like the getMore command overrides Command::reserveBytesForReply() to specify that we should reserve ~16MB. Upon a first glance it looks like we could move this method to the 'parsed' 'Invocation' of the class, which has access to a GetMoreRequest. I think there's only one call site of 'reserveBytesForReply', from service_entry_point_common.cpp. If we have the GetMoreRequest, we could conditionalize the number of bytes to preallocate based on whether the cursor is tailable. If the cursor is tailable we usually wouldn't expect to need 16MB, since we'll only be getting the newest data. Does that sound crazy? It seems like it would certainly be a win for a change stream which I'd expect to usually require fewer than 16MB. For other tailable cursors it seems possible that there are a substantial number of full 16MB batches before the tailable part, but I'm not sure what better heuristic we could ever use, or how noticeable a regression like that would be. Moving this back to 'Needs Triage' to discuss. |