-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Affects Version/s: 4.11.3, 5.1.3
-
Component/s: Kotlin
-
None
-
None
-
Java Drivers
-
None
-
None
-
None
-
None
-
None
-
None
Summary
Since mongodb-driver-kotlin-sync 4.11.3 & 5.1.3, a ByteArray field in a Kotlin data class is encoded by {{DataClassCodec }}as a BSON Array of Int32 elements (one element per byte) instead of BSON Binary. This causes an ~8-10x document size expansion, silently breaking code that stores binary content in a data class.
A document holding 2 MB of binary data now encodes to ~20 MB, exceeding MongoDB's 16 MB document limit and throwing BsonMaximumSizeExceededException at runtime with no compile-time warning.
Root cause
PR #1457 (JAVA-5122) introduced an isArray() check at the top of DataClassCodec.getCodec().
When the check is true, ArrayCodec.create() is called unconditionally, bypassing the codec registry entirely.
Before 5.1.3, a ByteArray field fell through to the codec registry, which returned ByteArrayCodec, encoding the value as compact BSON Binary. From 5.1.3 onward, the isArray() check intercepts ByteArray first and routes it through ArrayCodec, which iterates each byte and writes it as a separate BSON Int32.
JAVA-5122 was filed for Array<String> — object arrays that previously threw during codec construction. The fix correctly addresses that case but overly broadly captures primitive arrays like ByteArray, which already had a correct and compact encoding via ByteArrayCodec.
Expected behavior
A ByteArray field in a Kotlin data class should be encoded as BSON Binary, consistent with the behavior of ByteArrayCodec and with all prior versions of mongodb-driver-kotlin-sync.
Actual behavior
Since 4.11.3 & 5.1.3, the field is encoded as a BSON Array of Int32 elements, one per byte.
Reproducer
No MongoDB instance is required to run this test.
import com.mongodb.kotlin.client.MongoClient import org.bson.BsonBinaryWriter import org.bson.codecs.EncoderContext import org.bson.io.BasicOutputBuffer import org.junit.jupiter.api.Test import org.assertj.core.api.Assertions.assertThat data class Attachment(val name: String, val content: ByteArray) class ByteArrayDataClassEncodingTest { @Test fun `storing an attachment should not exceed MongoDB 16MB limit for reasonable content sizes`() { MongoClient.create("mongodb://localhost:27017").use { client -> val codec = client.getDatabase("test") .getCollection<Attachment>("test") .codecRegistry .get(Attachment::class.java) val twoMegabytes = ByteArray(2_000_000) val buffer = BasicOutputBuffer() BsonBinaryWriter(buffer).use { writer -> codec.encode(writer, Attachment("report.pdf", twoMegabytes), EncoderContext.builder().build()) } // Expected: ~2 MB. Actual on mongodb-driver-kotlin-sync >= 5.1.3: ~20 MB assertThat(buffer.size) .describedAs("encoded document size") .isLessThan(16_000_000) } } }
The test :
- passes on 4.11.2 and fails from 4.11.3 onward,
- passes on 5.1.2 and fails from 5.1.3 onward.
Impact
- Silent regression: existing code that stores ByteArray fields in data classes breaks at runtime with no compile-time indication
- Any binary content larger than ~1.6 MB stored via a data class field will throw
BsonMaximumSizeExceededException - I fear the old encoding (BSON Binary) and the new encoding (BSON Array) are not round-trip compatible: data written before the upgrade cannot be read back correctly after upgrading
Version information
- Introduced in: 4.11.3 and 5.1.3 (commit 4df1108, PR #1457)
- Confirmed working: 4.11.2 and 5.1.2
- Confirmed broken: 5.8.0 (latest at time of writing)
- Not mentioned in the official release notes: https://www.mongodb.com/docs/drivers/java/sync/current/reference/release-notes/
- is caused by
-
JAVA-5122 Cannot serialize Array property
-
- Closed
-