[KAFKA-219] Source Connector unable to recover from broken change stream due to event > 16MB Created: 27/Apr/21  Updated: 28/Oct/23  Resolved: 09/Jul/21

Status: Closed
Project: Kafka Connector
Component/s: None
Affects Version/s: None
Fix Version/s: 1.6.0

Type: Improvement Priority: Unknown
Reporter: Branden Makana Assignee: Ross Lawley
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

If a change event is encountered in a change stream that exceeds the 16MB limit, the mongodb connector becomes stuck in a failure loop and is unable to recover without having offsets deleted. The new DLQ feature does not help, as the issue is happening inside mongodb (there's no "bad event" received to be sent to the DLQ). 

Connector config: 

 

"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "dlq",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true"
 

example output when a "too large" event is encountered: 

 

2021-04-27 03:35:42,633] INFO An exception occurred when trying to get the next item from the Change Stream: Query failed with error code 10334 and error message 'BSONObj size: 20793516 (0x13D48AC) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "826087866E000000682B022C0100296E5A100411C5B2DDA8794637A32244ED8485B866463C5F70726F664B65792E5F616363744964003C3466353736383061003C5F70726F664B65792E5F..." }' on server mongo:27097 (com.mongodb.kafka.connect.source.MongoSourceTask)
[2021-04-27 03:35:44,107] INFO Watching for collection changes on 'the_collection' (com.mongodb.kafka.connect.source.MongoSourceTask)
[2021-04-27 03:35:44,108] INFO Resuming the change stream after the previous offset: {"_data": "826087866E0000005D2B022C0100296E5A100411C5B2DDA8794637A32244ED8485B866463C5F70726F664B65792E5F616363744964003C3338323730313430003C5F70726F664B65792E5F73003C70726F66696F003C5F6964003C316634386C6472633875347265383030000004"} (com.mongodb.kafka.connect.source.MongoSourceTask)
[2021-04-27 03:35:44,696] WARN Failed to resume change stream: BSONObj size: 20793516 (0x13D48AC) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "826087866E000000682B022C0100296E5A100411C5B2DDA8794637A32244ED8485B866463C5F70726F664B65792E5F616363744964003C3466353736383061003C5F70726F664B65792E5F..." } 10334=====================================================================================
If the resume token is no longer available then there is the potential for data loss.
Saved resume tokens are managed by Kafka and stored with the offset data.To restart the change stream with no resume token either: 
  * Create a new partition name using the `offset.partition.name` configuration.
  * Set `errors.tolerance=all` and ignore the erroring resume token. 
  * Manually remove the old offset from its configured storage.Resetting the offset will allow for the connector to be resume from the latest resume
token. Using `copy.existing=true` ensures that all data will be outputted by the
connector but it will duplicate existing data.
=====================================================================================
 (com.mongodb.kafka.connect.source.MongoSourceTask)
[2021-04-27 03:35:49,093] INFO Watching for collection changes on 'the_collection' (com.mongodb.kafka.connect.source.MongoSourceTask)
[2021-04-27 03:35:49,094] INFO Resuming the change stream after the previous offset: {"_data": "826087866E0000005D2B022C0100296E5A100411C5B2DDA8794637A32244ED8485B866463C5F70726F664B65792E5F616363744964003C3338323730313430003C5F70726F664B65792E5F73003C70726F66696F003C5F6964003C316634386C6472633875347265383030000004"} (com.mongodb.kafka.connect.source.MongoSourceTask)
[2021-04-27 03:35:49,684] WARN Failed to resume change stream: BSONObj size: 20793516 (0x13D48AC) is invalid. Size must be betwe... (repeats until killed) 



 Comments   
Comment by Githook User [ 09/Jul/21 ]

Author:

{'name': 'Ross Lawley', 'email': 'ross.lawley@gmail.com', 'username': 'rozza'}

Message: Fixed Source Connector recovery during getMore call (#79)

KAFKA-230 KAFKA-219
Branch: master
https://github.com/mongodb/mongo-kafka/commit/2f1b034437cb140d8d30682169fbb633c98c5623

Comment by Ross Lawley [ 07/Jul/21 ]

PR: https://github.com/mongodb/mongo-kafka/pull/79

Comment by Ross Lawley [ 11/May/21 ]

Marking for fixing in the 1.6.0 release.

Generated at Thu Feb 08 09:05:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.