[KAFKA-127] Kafka Source connector handling documents greater than 16MB BSON Created: 10/Jul/20  Updated: 16/Jul/20  Resolved: 16/Jul/20

Status: Closed
Project: Kafka Connector
Component/s: None
Affects Version/s: 1.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sabari Gandhi Assignee: Ross Lawley
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Kafka Connector 1.1
Mongo DB:3.6


Attachments: Java Source File MongoTest.java     Text File mongo_1.1.txt     Text File mongo_1.2.txt    
Issue Links:
Duplicate
duplicates KAFKA-105 Support errors.tolerance Closed

 Description   

Change stream response documents must adhere to the 16MB BSON : https://docs.mongodb.com/manual/administration/change-streams-production-recommendations/

Currently Kafka connector:

1) Doesn't create a error or exception but has an info message:

 INFO An exception occurred when trying to get the next item from the changestream. (com.mongodb.kafka.connect.source.MongoSourceTask) INFO An exception occurred when trying to get the next item from the changestream. (com.mongodb.kafka.connect.source.MongoSourceTask)kfc-mongodb-7bbc79cb64-hdddg kfc-mongodb 2020-07-09T16:29:47.376320977Z com.mongodb.MongoCommandException: Command failed with error 10334 (Location10334): 'BSONObj size: 19376544 (0x127A9A0) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: BinData(0, 825F0745F700000005461E5F69640031035342000004BC005A100458C65EB8491243558BB64BDAC2914E5204), _typeBits: BinData(0, 02) }' on server db-core-1.ebs.us-east-1.stage-us.int.evbg.io:29001. The full response is {"operationTime": {"$timestamp": {"t": 1594312187, "i": 87, "ok": 0.0, "errmsg": "BSONObj size: 19376544 (0x127A9A0) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: BinData(0, 825F0745F700000005461E5F69640031035342000004BC005A100458C65EB8491243558BB64BDAC2914E5204), _typeBits: BinData(0, 02) }", "code": 10334, "codeName": "Location10334", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1594312187, "i": 87}}, "signature": {"hash":

{"$binary": "WPjr3+K8zh6NDOpkTmK6rNvmM4M=", "$type": "00"}

, "keyId": {"$numberLong": "6828579125364523010"}}}}}}

2. Task status is still running but it fails to process change stream

kfc-mongodb-7bbc79cb64-hdddg kfc-mongodb 2020-07-09T15:59:57.191098599Z [2020-07-09 15:59:57,190] INFO Failed to resume change stream: BSONObj size: 19224166 (0x1255666) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: BinData(0, 825F073D080000000E461E5F69640031035342000004BC005A100458C65EB8491243558BB64BDAC2914E5204), _typeBits: BinData(0, 02) } 10334 (com.mongodb.kafka.connect.source.MongoSourceTask)

 

Regarding error handling, this seems related to https://jira.mongodb.org/browse/KAFKA-89

Is there a way to have connector resilient of this issues with improved error handling and task status. Thanks in advance.



 Comments   
Comment by Ross Lawley [ 16/Jul/20 ]

Thanks for the ticket.

I'm going to mark this as a duplicate of KAFKA-105 and will ensure that the scenario is tested when errors.tolerance support is added.

Comment by Sabari Gandhi [ 15/Jul/20 ]

I was able to reproduce the issue and have attached logs for both 1.1 and 1.2.

Setup:

  • I used the docker setup that is available in the github example. I am looking for the full document when the update is made so I have change.stream.full.document updateLookup
  • I have attached the MongoTest file which created a document and keeps updating the document ina. loop. So you will hit the error in some time. attached logs for both 1.1 and 1.2
  • The connector status will still be running and it fails to resume change stream.

 

As mentioned in 1.2 logs support for errors.tolerance will ensure the error is logged and the connector is functional in such scenarios. There is a ticket open for that https://jira.mongodb.org/browse/KAFKA-105 . Thanks !

 

 

Comment by Sabari Gandhi [ 13/Jul/20 ]

Scenario / Use case:

  • In my setup, I am looking for the full document when the update is made so I have change.stream.full.document updateLookup.
  • When any document is updated in this case appended with additional fields/information after some updates the BSON size grows and reaches the threshold.
  • Since the setup looks for change stream and when the scenario is met I get the above exception.

I am trying to reproduce the issue locally will updated additional details. Thanks!
 

Generated at Thu Feb 08 09:05:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.