Loading...

XML

Word

Printable

JSON

Type: Epic
Resolution: Fixed
Priority: Unknown
Fix Version/s: 5.5.0
Affects Version/s: None
Component/s: Performance
Labels:
None

Epic Name:
Performance Improvements Phase 1
Assigned Teams:

Java Drivers
Documentation Changes:
Needed
Documentation Changes Summary:

Hide
1. What would you like to communicate to the user about this feature?

We should highlight the performance improvements introduced in version 5.5.0 of the driver (percentage-based metrics).

2. Would you like the user to see examples of the syntax and/or executable code and its output?

We can link to the driver and spec repository folders containing our benchmark suites for clarity, which were used to evaluate performance against server version 8.

3. Which versions of the driver/connector does this apply to?

These improvements apply to the Java and Kotlin sync drivers. The async driver should also benefit, as the changes were made in the shared driver core, although we don’t have specific metrics for async improvements.

Show
1. What would you like to communicate to the user about this feature? We should highlight the performance improvements introduced in version 5.5.0 of the driver (percentage-based metrics). 2. Would you like the user to see examples of the syntax and/or executable code and its output? We can link to the driver and spec repository folders containing our benchmark suites for clarity, which were used to evaluate performance against server version 8. 3. Which versions of the driver/connector does this apply to? These improvements apply to the Java and Kotlin sync drivers. The async driver should also benefit, as the changes were made in the shared driver core, although we don’t have specific metrics for async improvements.
Epic Status:
Done

Quarter:
- FY25Q2
- FY25Q4
- FY26Q1
Scope Cost Estimate:
4
Cost to Date:
4.1
Final Cost Estimate:
4
Cost Threshold %:
100
Cost Diff from Original %:
0
Confidence Status:
🔵 Done
Latest Project Update:
None
Detailed Project Statuses:
Hide

Engineer(s): Viacheslav Babanin (Nathan Xu, Ross Lawley)

2025-05-2

Last two weeks?

Merged a follow up PR for SWAR optimization.

Made a final performance run to assess the overall performance impact.

Standard transport settings

Deep BSON Decoding | 19.44% | 5.4 z-score

Deep BSON Encoding | 102% | 22.8 z-score

Find many and empty the cursor | 25.08% | 13.72 z-score

Find one by ID | 2.7 % | 3.16 z-score

Flat BSON Decoding | 31.2% | 9.38 z-score

Flat BSON Encoding | 199.5% | 12.34 z-score

Full BSON Decoding | 16.5% | 7.23 z-score

Full BSON Encoding | 147.3% | 10.39 z-score

LDJSON multi-file export | 6% | 1.12 z-score

LDJSON multi-file import | 21.8% | 8.21 z-score

Large doc Client BulkWrite insert | 91.3% | 24.44 z-score

Large doc Collection BulkWrite insert | 96.5% | 8.79 z-score

Large doc bulk insert | 93.3% | 8.11 z-score

Large doc insertOne | 82.4% | 7.28 z-score

Run command | 3.9% | 5.18 z-score

Small doc Client BulkWrite insert | 49.5% | 17.99 z-score

Small doc Collection BulkWrite insert | 47.8% | 6.44 z-score

Small doc bulk insert | 39.3% | 5.72 z-score

Small doc insertOne | 2.9% | 1.91 z-score

Netty transport settings

Find many and empty the cursor | 40.3% | 14.81 z-score

Find one by ID | 4.4% | 4.12 z-score

LDJSON multi-file export | 4.3% | 1.19 z-score

LDJSON multi-file import | 16.9% | 3.77 z-score

Large doc Client BulkWrite insert | 54.8% | 14.49 z-score

Large doc Collection BulkWrite insert | 104.9% | 38.72 z-score

Large doc bulk insert | 74.6% | 65.55 z-score

Large doc insertOne | 66.6% | 58.65 z-score

Run command | 9.3% | 8.53 z-score

Small doc Client BulkWrite insert | 36.1% | 15.41 z-score

Small doc Collection BulkWrite insert | 39.3% | 37.38 z-score

Small doc bulk insert | 35.1% | 41.51 z-score

2025-04-24

Last two weeks?

Got all reviewed PRs merged into main.

Created and merged PR for BsonWriter optimization.

Focus over the next two weeks?

Create a follow up PR for SWAR optimization.

Make a final performance run to assess the overall performance impact.

2025-04-11

Last two weeks?

Finalized read path optimizations and created a PR for review.

Created PRs for small improvements discovered during performance investigation.

Analyzed performance across all open and already merged PRs. Approximate gains observed

Find many and empty the cursor: ~18% improvement

Small doc bulk insert: ~29.4% improvement

Large doc bulk insert: ~82.4% improvement

Large doc insertOne: ~73.1% improvement

Small doc collection bulkWrite insert: ~36.1% improvement

Large doc collection bulkWrite insert: ~74% improvement

Small doc client bulkWrite insert: ~40.1% improvement

Large doc client bulkWrite insert: ~81.5% improvement

Deep BSON encoding: ~66.3% improvement

LDJSON multi-file import: ~26.7% improvement

Deep BSON decoding: ~16.6% improvement

Flat BSON decoding: ~23.3% improvement

Full BSON decoding: ~16.9% improvement

Flat BSON encoding: ~207.8% improvement

Full BSON encoding: ~98.2% improvement

Focus over the next two weeks?

Get all currently reviewed PRs merged into main.

Make a final performance run to assess the overall performance impact.

2025-03-28

Last two weeks?

Finalized writeString optimizations and created a PR for review.

Focus over the next two weeks?

Finalize read path optimizations and create a PR for review.

Create PRs for small improvements discovered during performance investigation.

2025-03-14

Last two weeks?

Refined BSON byte buffer numeric optimization changes and created a PR for review.

Merged codec performance improvements into the main branch.

Experimented with BSON read path optimizations, achieving a 25-30% improvement.

Created a PR for adding a Netty benchmark suite to measure the performance impact of recent changes.

Reviewed external performance PRs for BsonOutput improvements.

Focus over the next two weeks?

Finalize writeString optimizations and create a PR for review.

Finalize read path optimizations and create a PR for review.

Introduce comprehensive test coverage to ensure there are no regressions after optimizations.

Impediments encountered in the last two weeks?

While running performance tests, discovered a bug leading to a resource leak in Netty transport settings (Ticket: ~~JAVA-5812~~).

Waiting times to run a comprehensive performance analysis on the Evergreen perf analyzer.

2025-02-26

Last two weeks?

Performance optimizations for BSON encoding and decoding to improve efficiency for both BsonArrayCodec and BsonDocumentCodec in codec lookups.

Added JMH into the repository, enabling local benchmarking to assess the relative performance impact of small components beyond spec benchmark tests.

Set up performance test environment on Evergreen spawn host to enable Linux perf CPU profiler.

Profiling encoding/write path to identify hotspots for optimization.

Experimented with byte buffer numeric optimizations. Initial experiments lead to an approximate 20-30% insert document rate improvement.

Experimented with byte buffer string optimizations, with experiments showing 30-90% insert rate improvement, and Deep BSON encoding achieving up to 200% improvement.

Focus over the next two weeks?

Refine BSON byte buffer numeric optimization changes and create PR for review.

Get codec performance improvements through review and merge into the main branch.

Continue performance optimizations of writeString and determine next optimization steps.

Start profiling BSON read path to look for potential improvements.

Review external performance PRs for BsonOutput improvements.

Consider adding a Netty benchmark suite to measure the performance impact of recent changes

Impediments encountered in the last two weeks?

Discovered a discrepancy in bulk write behaviors (client bulk write and old one) while improving BsonDocumentCodec and need further clarifications.
Show
Engineer(s): Viacheslav Babanin (Nathan Xu, Ross Lawley) 2025-05-2 Last two weeks? Merged a follow up PR for SWAR optimization. Made a final performance run to assess the overall performance impact. Standard transport settings Deep BSON Decoding | 19.44% | 5.4 z-score Deep BSON Encoding | 102% | 22.8 z-score Find many and empty the cursor | 25.08% | 13.72 z-score Find one by ID | 2.7 % | 3.16 z-score Flat BSON Decoding | 31.2% | 9.38 z-score Flat BSON Encoding | 199.5% | 12.34 z-score Full BSON Decoding | 16.5% | 7.23 z-score Full BSON Encoding | 147.3% | 10.39 z-score LDJSON multi-file export | 6% | 1.12 z-score LDJSON multi-file import | 21.8% | 8.21 z-score Large doc Client BulkWrite insert | 91.3% | 24.44 z-score Large doc Collection BulkWrite insert | 96.5% | 8.79 z-score Large doc bulk insert | 93.3% | 8.11 z-score Large doc insertOne | 82.4% | 7.28 z-score Run command | 3.9% | 5.18 z-score Small doc Client BulkWrite insert | 49.5% | 17.99 z-score Small doc Collection BulkWrite insert | 47.8% | 6.44 z-score Small doc bulk insert | 39.3% | 5.72 z-score Small doc insertOne | 2.9% | 1.91 z-score Netty transport settings Find many and empty the cursor | 40.3% | 14.81 z-score Find one by ID | 4.4% | 4.12 z-score LDJSON multi-file export | 4.3% | 1.19 z-score LDJSON multi-file import | 16.9% | 3.77 z-score Large doc Client BulkWrite insert | 54.8% | 14.49 z-score Large doc Collection BulkWrite insert | 104.9% | 38.72 z-score Large doc bulk insert | 74.6% | 65.55 z-score Large doc insertOne | 66.6% | 58.65 z-score Run command | 9.3% | 8.53 z-score Small doc Client BulkWrite insert | 36.1% | 15.41 z-score Small doc Collection BulkWrite insert | 39.3% | 37.38 z-score Small doc bulk insert | 35.1% | 41.51 z-score 2025-04-24 Last two weeks? Got all reviewed PRs merged into main. Created and merged PR for BsonWriter optimization. Focus over the next two weeks? Create a follow up PR for SWAR optimization. Make a final performance run to assess the overall performance impact. 2025-04-11 Last two weeks? Finalized read path optimizations and created a PR for review. Created PRs for small improvements discovered during performance investigation. Analyzed performance across all open and already merged PRs. Approximate gains observed Find many and empty the cursor: ~18% improvement Small doc bulk insert: ~29.4% improvement Large doc bulk insert: ~82.4% improvement Large doc insertOne: ~73.1% improvement Small doc collection bulkWrite insert: ~36.1% improvement Large doc collection bulkWrite insert: ~74% improvement Small doc client bulkWrite insert: ~40.1% improvement Large doc client bulkWrite insert: ~81.5% improvement Deep BSON encoding: ~66.3% improvement LDJSON multi-file import: ~26.7% improvement Deep BSON decoding: ~16.6% improvement Flat BSON decoding: ~23.3% improvement Full BSON decoding: ~16.9% improvement Flat BSON encoding: ~207.8% improvement Full BSON encoding: ~98.2% improvement Focus over the next two weeks? Get all currently reviewed PRs merged into main. Make a final performance run to assess the overall performance impact. 2025-03-28 Last two weeks? Finalized writeString optimizations and created a PR for review. Focus over the next two weeks? Finalize read path optimizations and create a PR for review. Create PRs for small improvements discovered during performance investigation. 2025-03-14 Last two weeks? Refined BSON byte buffer numeric optimization changes and created a PR for review. Merged codec performance improvements into the main branch. Experimented with BSON read path optimizations, achieving a 25-30% improvement. Created a PR for adding a Netty benchmark suite to measure the performance impact of recent changes. Reviewed external performance PRs for BsonOutput improvements. Focus over the next two weeks? Finalize writeString optimizations and create a PR for review. Finalize read path optimizations and create a PR for review. Introduce comprehensive test coverage to ensure there are no regressions after optimizations. Impediments encountered in the last two weeks? While running performance tests, discovered a bug leading to a resource leak in Netty transport settings (Ticket: JAVA-5812 ). Waiting times to run a comprehensive performance analysis on the Evergreen perf analyzer. 2025-02-26 Last two weeks? Performance optimizations for BSON encoding and decoding to improve efficiency for both BsonArrayCodec and BsonDocumentCodec in codec lookups. Added JMH into the repository, enabling local benchmarking to assess the relative performance impact of small components beyond spec benchmark tests. Set up performance test environment on Evergreen spawn host to enable Linux perf CPU profiler. Profiling encoding/write path to identify hotspots for optimization. Experimented with byte buffer numeric optimizations. Initial experiments lead to an approximate 20-30% insert document rate improvement. Experimented with byte buffer string optimizations, with experiments showing 30-90% insert rate improvement, and Deep BSON encoding achieving up to 200% improvement. Focus over the next two weeks? Refine BSON byte buffer numeric optimization changes and create PR for review. Get codec performance improvements through review and merge into the main branch. Continue performance optimizations of writeString and determine next optimization steps. Start profiling BSON read path to look for potential improvements. Review external performance PRs for BsonOutput improvements. Consider adding a Netty benchmark suite to measure the performance impact of recent changes Impediments encountered in the last two weeks? Discovered a discrepancy in bulk write behaviors (client bulk write and old one) while improving BsonDocumentCodec and need further clarifications.
Estimated Weeks:
4

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None
Goal Tier(s):
None

There are both known and potential opportunities to improve performance across the driver. This Epic covers the investigation of such opportunities and the implementation of feasible optimizations.

Summary of Performance Optimizations
Following profiling, benchmarking, and code analysis, several improvements were explored, prototyped, validated, and implemented.

BSON codec optimizations:

BsonArrayCodec and BsonDocumentCodec now use a BsonTypeCodecMap to avoid costly CodecRegistry#get lookups.

ObjectId optimization:

In-memory representation of ObjectId changed from (int, int, short, int) to (int, long) for more efficient sorting and serialization.
The optimization above enabled BsonObjectId#compareTo to be refactored to avoid allocating extra arrays; it now directly compares fields using Integer.compareUnsigned.

Write path improvements:

Replaced byte-level writes with ByteBuffer.putInt/putLong/putDouble where applicable as those methods are often intrinsified with efficient machine instruction and allow to store more bytes in a single instruction, eliminating the overhead of a native method calls.
ByteBufferBsonOutput now caches the current buffer in a field instead of repeated bufferList.get(index) calls - reducing bounds checks.
BasicOutputBuffer#toByteArray() was optimized to avoid double-copying.
Reduced constant factor of buffer lookups in writeInt32 from 4 to 1 by preferring a single putInt call when capacity allows.
Replaced java.util.Stack with ArrayDeque in BsonWriter - removing unnecessary synchronization overhead.
Introduced caching for array index field names ("0", "1", ...) avoiding repetitive allocations.

Read path Improvements:

Reduced allocations during String decoding by reusing buffers and avoiding unnecessary copies.
Scanned for null terminators in bulk (SWAR) instead of byte-by-byte.

The following optimizations were identified but not completed within the scope of this epic:

BsonDocumentCodec _id field re-ordering could be more efficient
Introduce valueOf methods in BsonInt32 and BsonInt64 (and use them in corresponding Codec) that are roughly equivalent to the ones in Integer and Long (which have caches for small values).{}
BsonDocumentCodec shouldn't make a copy of its elements when decoding
ByteBufferBsonOutput is great for minimizing heap use, but there is a performance cost of all the buffer management that it has to do in an inner loop. We can consider using a simpler implementation of OutputBuffer that trades off memory use for speed. For example, we could just cache 48MB buffers instead of power-of-two buffers.
Lots of calls to CodecRegistry#get in DocumentCodec#writeValue and {BsonDocumentCodec#writeValue}}. The implementation of this method in ProvidersCodecRegistry is not built for use in inner loops. Some caching within the Codec implementation could be useful here.

Assignee:: Slav Babanin
Reporter:: Jeffrey Yemin
Goal DRI(s):: None
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jan 31 2024 09:58:57 PM UTC
Updated:: May 24 2025 04:21:56 AM UTC
Resolved:: May 05 2025 07:15:11 AM UTC
Calendar Time:: 1 year, 11 weeks, 6 days
Target start:: 03/Feb/25
Target end:: 28/Feb/25
Start date:: 10/Feb/24
End date:: 02/May/25
Confidence Status Last Update:: 05/May/25 7:23 AM
Goal Completion Date:: None

Details

Description

Attachments

Forms

Activity

People

Dates