[KAFKA-76] Reuse the postBatchResumeToken Created: 04/Dec/19  Updated: 28/Oct/23  Resolved: 15/Sep/20

Status: Closed
Project: Kafka Connector
Component/s: Source
Affects Version/s: None
Fix Version/s: 1.3.0

Type: Improvement Priority: Major - P3
Reporter: Davenson Lombard Assignee: Ross Lawley
Resolution: Fixed Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
Duplicate
is duplicated by KAFKA-96 Source Connector: The resume token UU... Closed
is duplicated by KAFKA-93 Connector issue with high load: Query... Closed
Related
is related to KAFKA-176 Improve heartbeat usability Closed
Case:
Documentation Changes: Needed
Documentation Changes Summary:

Added two new configurations:

heartbeat.interval.ms=0

The length of time between sending heartbeat messages to record the post batch resume token when no source records have been published. Improves the resumability of the connector for low volume namespaces. Use 0 to disable.

heartbeat.topic.name="__mongodb_heartbeats"
The name of the topic to publish heartbeats to. Defaults to '__mongodb_heartbeats'.

Note: By default this feature is off, setting the `heartbeat.interval.ms` will turn it on. If there have been no messages in the `heartbeat.interval.ms` time then the post batch resume token is sent to the heartbeat topic. Messages on the heartbeat topic have to be consumed so that the latest offset (post batch resume token) is available.


 Description   

The latest Kafka connector uses Java 3.11 and thus takes advantage of SERVER-35740 (HighWaterMarkToken). Under the hood though, it is the resumeToken (offset) and the corresponding event (matching the change stream filter) that are published as topics. If a connector crash, the offset from the last published Source Record is used as the resumeToken upon restart.

In certain situation, such as when the connector is listening a dormant db/collection, there is a potential for the resumeToken to be out of the oplog upon restart. Saving the postBatchResumeToken will reduce the likelihood of such a failure to occur



 Comments   
Comment by Githook User [ 15/Sep/20 ]

Author:

{'name': 'Ross Lawley', 'email': 'ross.lawley@gmail.com', 'username': 'rozza'}

Message: Added PostBatchResumeToken Support

KAFKA-76
Branch: master
https://github.com/mongodb/mongo-kafka/commit/f5b15a93851d36ff2ca80a2ca642231ac3121c31

Comment by Ross Lawley [ 03/Sep/20 ]

PR: https://github.com/mongodb/mongo-kafka/pull/32

Comment by Ross Lawley [ 05/Dec/19 ]

The issue is the Kafka Connector API doesn't support saving offsets without data - the only data that can be stored is with the SourceRecord that is scheduled to be published onto a topic. As there is no data from the change stream when polling there is nothing that can be published.

So we can't store the postBatchResumeToken when polling the cursor if there is no data and there is nothing I'm aware of in the Kafka Connect API that would allow the support for just saving offsets without data.

Generated at Thu Feb 08 09:05:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.