[KAFKA-330] Source Connector ChangeStream support start at specified time Created: 20/Sep/22  Updated: 12/May/23  Resolved: 15/Nov/22

Status: Closed
Project: Kafka Connector
Component/s: Source
Affects Version/s: 1.8.0
Fix Version/s: 1.9.0

Type: Improvement Priority: Major - P3
Reporter: Jiabao Sun Assignee: Valentin Kavalenka
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
Problem/Incident
causes KAFKA-370 Fix resource management in MongoSourc... Closed
Related
related to SPARK-378 Support configurable startAtOperation... Closed
is related to KAFKA-234 Clean up MongoSourceTask Backlog
Quarter: FY23Q4
Documentation Changes: Needed

 Description   

In some scenarios we want to open a changestream by specifying a point in time.

A simple way is to obtain the first resumeToken and cache it through startAtOperationTime when we need start change stream at a specified time.

Do we plan to support this feature ?  I'm willing to take this ticket.



 Comments   
Comment by Githook User [ 15/Nov/22 ]

Author:

{'name': 'Valentin Kovalenko', 'email': 'valentin.kovalenko@mongodb.com', 'username': 'stIncMale'}

Message: Add the new `startup.mode.timestamp.start.at.operation.time` config property (#125)

KAFKA-330
Branch: master
https://github.com/mongodb/mongo-kafka/commit/40ae109680c3ec1af5c03850b76154da91d87d02

Comment by Jiabao Sun [ 26/Sep/22 ]

Hi Robert, do you have some suggestions?

Comment by Jiabao Sun [ 24/Sep/22 ]

Thanks Robert for the reply.

In some streaming computing scenarios, users may be more concerned about data changes within a time window, rather than requiring a complete data copy (copy.existing=false). If we can explicitly specify the change stream start time, it will be of great help.

I think we can open this feature with copy.existing=false, it will not affect the data integrity.

Comment by Robert Walters [ 23/Sep/22 ]

jiabao.sun@xtransfer.cn Hi, thank you for your request.  Can you elaborate more on your use case? There is some concern about data integrity if you could restart the connector and bounce around the change stream resume tokens.

Generated at Thu Feb 08 09:06:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.