[CDRIVER-4262] Memory consumption possibly too high with cursor Created: 13/Jan/22 Updated: 27/Oct/23 Resolved: 27/Jan/22 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Unknown |
| Reporter: | Kai Takac | Assignee: | Ezra Chung |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
SummaryI was investigating a memory consumption behavior within my own application and I noticed that the Mongo C driver is consuming more memory than I thought was necessary. What I did is that I setup a fake DB with documents that are roughly 2MB in uncompressed size in BSON representation. I then created an application that uses a cursor to iterate over the documents with a batch size of two. I assumed that in any given moment, the memory consumption therefore lies around ~4MB. So I ran the application with Valgrind's Massif and noticed that the memory consumption was at peak, 8MB in size. So my guessing is that at some point the batch is copied instead of being moved. I've figured out that the driver copies the server reply in the function mongoc_cluster_run_opmsg from the statically allocated reply_local variable to the reply variable. See this part here: https://github.com/mongodb/mongo-c-driver/blob/ded9ae5e9f2897a283305175aae8e1bbf4021c36/src/libmongoc/src/mongoc/mongoc-cluster.c#L3538. I'm quite unfamiliar with the C driver so I was not sure if it possible to "move" the memory to the final reply. **I've attached the program and the output file of Valgrind. Environment Linux, Valgrind with driver version 1.20.1 How to Reproduce
|
| Comments |
| Comment by Ezra Chung [ 27/Jan/22 ] |
|
kai.takac@gmail.com Thank you once again for the detailed report. As you have described, the ~4 MB payload (~2 MB document in a batch of size 2) is being copied during server message response handling, momentarily increasing the total memory usage up to ~8 MB. This momentary copy is fundamentally unavoidable given the C driver's current implementation of server message response parsing (read entire message from stream into memory buffer, reinterpret memory buffer as read-only RPC data, extract message payload from RPC data view as a BSON document). The extraction of the message payload into a modifiable and owning bson_t object from the read-only RPC data view (whose corresponding memory is owned by the local memory buffer) is the reason behind the temporary duplication of the ~4 MB payload. Eliminating this copy would necessitate a non-trivial refactor of how server message responses are currently being parsed by the C driver. As this behavior is working as designed, this bug ticket will be closed. |
| Comment by Kevin Albertson [ 13/Jan/22 ] |
|
Hello kai.takac@gmail.com, thank you for the detailed report and visualization. We will take a look at this soon. |