-
Type:
Epic
-
Resolution: Fixed
-
Priority:
Unknown
-
Affects Version/s: None
-
Component/s: Performance
-
None
There are both known and potential opportunities to improve performance across the driver. This Epic covers the investigation of such opportunities and the implementation of feasible optimizations.
Summary of Performance Optimizations
Following profiling, benchmarking, and code analysis, several improvements were explored, prototyped, validated, and implemented.
BSON codec optimizations:
- BsonArrayCodec and BsonDocumentCodec now use a BsonTypeCodecMap to avoid costly CodecRegistry#get lookups.
ObjectId optimization:
- In-memory representation of ObjectId changed from (int, int, short, int) to (int, long) for more efficient sorting and serialization.
- The optimization above enabled BsonObjectId#compareTo to be refactored to avoid allocating extra arrays; it now directly compares fields using Integer.compareUnsigned.
Write path improvements:
- Replaced byte-level writes with ByteBuffer.putInt/putLong/putDouble where applicable as those methods are often intrinsified with efficient machine instruction and allow to store more bytes in a single instruction, eliminating the overhead of a native method calls.
- ByteBufferBsonOutput now caches the current buffer in a field instead of repeated bufferList.get(index) calls - reducing bounds checks.
- BasicOutputBuffer#toByteArray() was optimized to avoid double-copying.
- Reduced constant factor of buffer lookups in writeInt32 from 4 to 1 by preferring a single putInt call when capacity allows.
- Replaced java.util.Stack with ArrayDeque in BsonWriter - removing unnecessary synchronization overhead.
- Introduced caching for array index field names ("0", "1", ...) avoiding repetitive allocations.
Read path Improvements:
- Reduced allocations during String decoding by reusing buffers and avoiding unnecessary copies.
- Scanned for null terminators in bulk (SWAR) instead of byte-by-byte.
The following optimizations were identified but not completed within the scope of this epic:
- BsonDocumentCodec _id field re-ordering could be more efficient
- Introduce valueOf methods in BsonInt32 and BsonInt64 (and use them in corresponding Codec) that are roughly equivalent to the ones in Integer and Long (which have caches for small values).
{} - BsonDocumentCodec shouldn't make a copy of its elements when decoding
- ByteBufferBsonOutput is great for minimizing heap use, but there is a performance cost of all the buffer management that it has to do in an inner loop. We can consider using a simpler implementation of OutputBuffer that trades off memory use for speed. For example, we could just cache 48MB buffers instead of power-of-two buffers.
- Lots of calls to CodecRegistry#get in DocumentCodec#writeValue and {BsonDocumentCodec#writeValue}}. The implementation of this method in ProvidersCodecRegistry is not built for use in inner loops. Some caching within the Codec implementation could be useful here.