[JAVA-2716] Bulk write messages can exceed maxMessageBytes of 48000000 Created: 26/Dec/17  Updated: 28/Oct/23  Resolved: 27/Dec/17

Status: Closed
Project: Java Driver
Component/s: Write Operations
Affects Version/s: 3.6.0
Fix Version/s: 3.6.2

Type: Bug Priority: Major - P3
Reporter: Yuriy [X] Assignee: Jeffrey Yemin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

On sharded cluster of MongoDB, version 3.6.1~rc1, we start to encounter connection drops from java services to mongos on bulk inserts. It happens on rare sets of data that we insert. At the moment we're trying to generate set of data to get drops, can't give work data because of NDA.
We use latest java driver 3.6.1 and sessions.
It happens independent on connection compression.

Looks like sometimes message length of batch is slightly bigger than max allowed - 48000000, and it leads to connection to server closing with error. Message size is different. Max that we've seen with these error was 48000021.

I'm not sure if it was on 3.6.0 version, because we coudn't use it because of SERVER-32372 bug.

Log from mongos (first batch was sent ok and on second it drops):

2017-12-25T19:14:49.608+0300 D SHARDING [conn5] Starting execution of write batch of size 44029 for database.collection
2017-12-25T19:14:49.763+0300 D SHARDING [conn5] warning: log line attempted (15501kB) over max size (10kB), printing beginning and end ... Sending write batch to sh1-rs1: { insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:49.877+0300 D EXECUTOR [conn5] warning: log line attempted (15501kB) over max size (10kB), printing beginning and end ... Scheduling remote command request: RemoteCommand 100 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:49.991+0300 D ASIO     [conn5] warning: log line attempted (15501kB) over max size (10kB), printing beginning and end ... startCommand: RemoteCommand 100 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:50.606+0300 D SHARDING [conn5] Write results received from sh1-rs1-2:27041: { ok: 1, n: 15136, opTime: { ts: Timestamp(1514218490, 15136), t: 2 }, electionId: ObjectId('7fffffff0000000000000002') }
2017-12-25T19:14:50.738+0300 D SHARDING [conn5] warning: log line attempted (15527kB) over max size (10kB), printing beginning and end ... Sending write batch to sh1-rs1: { insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:50.853+0300 D EXECUTOR [conn5] warning: log line attempted (15527kB) over max size (10kB), printing beginning and end ... Scheduling remote command request: RemoteCommand 104 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:50.969+0300 D ASIO     [conn5] warning: log line attempted (15527kB) over max size (10kB), printing beginning and end ... startCommand: RemoteCommand 104 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:51.601+0300 D SHARDING [conn5] Write results received from sh1-rs1-2:27041: { ok: 1, n: 15053, opTime: { ts: Timestamp(1514218491, 15053), t: 2 }, electionId: ObjectId('7fffffff0000000000000002') }
2017-12-25T19:14:51.728+0300 D SHARDING [conn5] warning: log line attempted (14320kB) over max size (10kB), printing beginning and end ... Sending write batch to sh1-rs1: { insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:51.826+0300 D EXECUTOR [conn5] warning: log line attempted (14320kB) over max size (10kB), printing beginning and end ... Scheduling remote command request: RemoteCommand 108 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:51.923+0300 D ASIO     [conn5] warning: log line attempted (14320kB) over max size (10kB), printing beginning and end ... startCommand: RemoteCommand 108 -- target:sh1-rs1-2:27041 db:database cmd:{ insert: "collection", bypassDocumentValidation: false, ordered: true, documents: [ ... ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("cca374f0-ec56-4c75-b4ce-d1fd597fd562"), uid: BinData(0, 4E66BD8E1B485D93D098867F94DCC1E595ADC4580AAD3E1CDAFE7858E889C8E6) } }
2017-12-25T19:14:52.573+0300 D SHARDING [conn5] Write results received from sh1-rs1-2:27041: { ok: 1, n: 13840, opTime: { ts: Timestamp(1514218492, 46224), t: 2 }, electionId: ObjectId('7fffffff0000000000000002') }
2017-12-25T19:14:52.574+0300 D SHARDING [conn5] Finished execution of write batch for database.collection
2017-12-25T19:14:52.580+0300 D SHARDING [conn5] Command end db: database msg id: 182
2017-12-25T19:14:53.621+0300 I NETWORK  [conn5] recv(): message msgLen 48000014 is invalid. Min 16 Max: 48000000
2017-12-25T19:14:53.621+0300 I NETWORK  [conn5] Error receiving request from client: ProtocolError: recv(): message msgLen 48000014 is invalid. Min 16 Max: 48000000. Ending connection from 10.0.0.22:56414 (connection id: 5)
2017-12-25T19:14:53.622+0300 I NETWORK  [conn5] end connection 10.0.0.22:56414 (4 connections now open)



 Comments   
Comment by Githook User [ 27/Dec/17 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'username': 'jyemin', 'name': 'Jeff Yemin'}

Message: JAVA-2716: Use the message start position, rather that the command start position,
as the basis for determining whether the max message size has been exceeded.
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/317c43174a8cece045ed32289569ddbfc1bf9fe5

Comment by Githook User [ 27/Dec/17 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'username': 'jyemin', 'name': 'Jeff Yemin'}

Message: JAVA-2716: Use the message start position, rather that the command start position,
as the basis for determining whether the max message size has been exceeded.
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/07eb5e4dadaef50eb9b24a0c468837ee3b911545

Comment by Jeffrey Yemin [ 26/Dec/17 ]

Hi Ubus, thanks for the report. We moved this issue to the JAVA project as it's likely an issue with the driver: the driver should not be sending a message that exceeds 48MB. We will take a look and get back to you if we have further questions about how to reproduce the bug.

Generated at Thu Feb 08 08:57:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.