Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: BSON
Labels:
- external-user

Quarter:
- FY27Q1
Investigation Story Points:
0
Confidence Status:
None

Documentation Changes:
Not Needed

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

We are been running into memory issues while pulling large Mongo result sets into memory, and after lots of memory profiling the issue seems to be related to ObjectId, specifically how memory inefficient Buffer/ArrayBuffer is when storing lots of small Buffers.

We were expecting an ObjectId to consume ~12bytes of memory (+ some overhead), but in reality this consumes 128 bytes per ObjectId (96 bytes for just the Buffer).

I opened an issue with the NodeJS Performance team but this appears to be working as designed: https://github.com/nodejs/performance/issues/173

Storing a string in Node/V8 is much more memory efficient since it's a primitive. A 24 character hex string only consumes 40 bytes of memory, AND it's much faster to serialize/deserialize.

Memory Usage

With this change, retained memory usage is reduced by ~45% (ObjectId size decreases from 128 bytes to 72 bytes) ** and young memory allocation is reduced by ~57% during deserialization. Currently, the hex string (40 bytes) already needs to be allocated in memory as ESON, then deserialized into BSON (which allocates 96 bytes for the buffer), then the string needs to be garbage collected. Since V8 uses string interning pool the string does not need to be reallocated if we persist it in the ObjectId.

By reducing the amount of memory that needs to be allocated we are further improving performance since garbage collection is quite expensive.

Performance Tests

I made the change to persist ObjectId as string in memory, and we only need to use Buffer to generate a new pk. Tested on M1 MacBook Pro

Operation	ObjectId (ops/sec)	ObjectId STRING (ops/sec)	Relative Improvement (%)
new ObjectId()	10,662,177	5,842,941	-45.18%
new ObjectId(string)	5,316,351	11,913,614	118.00%
deserialize	3,630,156	11,401,144	214.06%
serialize	11,498,443	82,022,643	613.54%
toHexString	11,314,960	81,086,841	616.57%
equals	39,653,912	44,413,312	11.99%

new ObjectId() does appear to be slower at first glance because it still uses Buffer to generate an ObjectId then it has to convert the buffer to hex string, but once you consider most paths will have to serialize that ObjectId after creating a new PK then it still comes out marginally faster.

Operation	ObjectId (ops/sec)	ObjectId STRING (ops/sec)	Relative Improvement (%)
new ObjectId() + serialize	5,785,814	5,900,631	2%

Example

Lets say we have a MongoDB collection with some user documents like this that points to another document.

{
  _id: ObjectId,
  name: string,
  addressId: ObjectId
}

Let's say we want to grab a bunch of the documents. 1 million in this example. Lets project to just the ObjectIds so other types don't skew our numbers

const docs = await collection.find({}, { limit: 1000000, projection: { _id: 1, addressId: 1 } }).toArray() // lets grab 1 million documents

BEFORE: 471 MB used

AFTER: 262MB used

Use Case

The above example was a little contrived, but we do have use cases where we will pull in large numbers of documents into memory to build Directed Acyclic Graphs, where each document represents an edge in the graph (this is particularly needed since MongoDB $graphLookup is quite limited).

This could also occur frequently on a server where many users are requesting small batches of documents.

User Impact

I believe this issue is greatly effecting all users of MongoDB in JS environments (NodeJS/browser) regardless of use case, just to different extents. By making this change I believe we would improve performance and reduce memory usage for all users.

Acceptance Criteria

Implementation Requirements

ObjectId class should only persist hex string representation of ObjectId
ObjectId class API should not have any breaking changes
All currently ObjectId functionality should behave as it does today
Performance for most use cases must not degrade
cacheHexString property to be deprecated
Optional: Add cacheBuffer option to cache buffer

Testing Requirements

unit tests
perf tests

Documentation Requirements

DOCSP ticket, API docs, etc

Follow Up Requirements

additional tickets to file, required releases, etc

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2024-07-01-11-01-49-666.png
142 kB
Jul 01 2024 03:01:50 PM UTC

Assignee:: Unassigned
Reporter:: Sean Reece
Reviewers:: Neal Beeken
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jul 01 2024 04:02:16 PM UTC
Updated:: Oct 17 2025 06:20:51 PM UTC

Details

Description