Summary
Mongot is currently using the mongodb-driver-sync:4.11.5 and does a lot a manipulation of RawBsonDocuments when materializing its result sets. We rely on the fact that serializing a RawBsonDocument is a simple, efficient array copy. When upgrading the mongodb-driver-sync to 5.5.3, we see several performance tests showing >100% regressions in E2E latency.
Version 5.5.1 without patch vs Baseline 4.11.5:
Version 5.5.1 with patch vs Baseline 4.11.5: https://performance-analyzer.server-tig.prod.corp.mongodb.com/perf-analyzer-viz/?comparison_id=698d00816295f61738b0357c&selected_tab=scatter-plots&percent_filter=0%7C%7C100&z_filter=0%7C%7C10&filter_type=Default
Disclaimer: The following Root Cause Analysis was identified by AI, but we verified its proposed workaround resolved the observed regressions.
It seems that https://github.com/mongodb/mongo-java-driver/pull/1632 changed BsonDocumentCodec to lookup Codecs by BsonType rather than by `getClass()`. This is problematic because BsonDocument and RawBsonDocument share a BsonType but require different handling. In the new implementation RawBsonDocuments are serialized by iterating over their values, deserializing them, and reserializing them. This results in several perf tests showing >100% increase in E2E latencies.
How to Reproduce
Create a BsonDocument that contains a BsonArray of 100,000 small RawBsonDocuments and serialize the object to a byte[].
This is the fix suggested by Cursor. Please let us know if we can do something better.
/**
* Creates a {@link BsonDocumentCodec} backed by a {@link CodecRegistry} that provides {@link
* RawBsonDocumentAwareBsonDocumentCodec} for {@code BsonDocument.class}. Because the codec's
* internal {@code BsonTypeCodecMap} resolves the codec for {@code BsonType.DOCUMENT} from the
* registry (keyed by {@code BsonDocument.class}), this ensures our override is used for ALL
* nested document values — including {@link RawBsonDocument} instances inside arrays and
* sub-documents.
*/
private static BsonDocumentCodec createOptimizedCodec() {
// A provider that returns our RawBsonDocument-aware codec for BsonDocument.class, delegating
// everything else to the standard BsonValueCodecProvider.
CodecProvider optimizedProvider =
new CodecProvider() {
private final BsonValueCodecProvider delegate = new BsonValueCodecProvider();
@Override
@SuppressWarnings("unchecked")
public <T> Codec<T> get(Class<T> clazz, CodecRegistry registry) {
if (clazz == BsonDocument.class) {
return (Codec<T>) new RawBsonDocumentAwareBsonDocumentCodec(registry);
}
return delegate.get(clazz, registry);
}
};
CodecRegistry registry = CodecRegistries.fromProviders(optimizedProvider);
return new RawBsonDocumentAwareBsonDocumentCodec(registry);
}
/**
* A {@link BsonDocumentCodec} subclass that restores the BSON 4.x behavior of encoding {@link
* RawBsonDocument} values by piping their raw bytes directly, rather than decoding and
* re-encoding each field.
*
* <p>In driver 5.x, {@link BsonDocumentCodec} resolves child codecs via a {@code
* BsonTypeCodecMap} keyed by {@code BsonType}, so a {@link RawBsonDocument} (which has
* BsonType.DOCUMENT) gets the generic {@link BsonDocumentCodec} codec. That codec calls {@code
* RawBsonDocument.entrySet()} which lazily decodes the raw bytes, then re-encodes every field —
* an O(fields) decode+encode instead of an O(bytes) memcpy.
*
* <p>This class overrides {@code encode()} to detect {@link RawBsonDocument} and use pipe-based
* byte copying. Because it is registered in the codec registry as the codec for {@code
* BsonDocument.class}, the internal {@code BsonTypeCodecMap} maps {@code BsonType.DOCUMENT} to
* this codec, so the optimization applies recursively to all nested document values.
*/
static final class RawBsonDocumentAwareBsonDocumentCodec extends BsonDocumentCodec {
private static final RawBsonDocumentCodec RAW_CODEC = new RawBsonDocumentCodec();
RawBsonDocumentAwareBsonDocumentCodec(CodecRegistry registry) {
super(registry);
}
@Override
public void encode(BsonWriter writer, BsonDocument document, EncoderContext encoderContext) {
if (document instanceof RawBsonDocument rawDoc) {
// Delegate to RawBsonDocumentCodec which pipes the raw bytes directly.
RAW_CODEC.encode(writer, rawDoc, encoderContext);
} else {
super.encode(writer, document, encoderContext);
}
}
}