[KAFKA-362] Source Connector - Exactly Once Semantics (KIP-618) Created: 29/Mar/23  Updated: 27/Oct/23  Resolved: 20/Apr/23

Status: Closed
Project: Kafka Connector
Component/s: Sink
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Robin Fehr Assignee: Robert Walters
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We're wondering if it would be feasible to implement the introduced properties by [KIP-618|https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors|https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors]

that got deployed in Kafka 3.3.

exactly.once.support 

transaction.boundary 

offsets.storage.topic 

transaction.boundary.interval.ms 

into the source connector or if that for some reason wouldn't work.



 Comments   
Comment by PM Bot [ 20/Apr/23 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by PM Bot [ 12/Apr/23 ]

Hey robin.fehr@dormakaba.com, We need additional details to investigate the problem. If this is still an issue for you, please provide the requested information.

Comment by Robin Fehr [ 04/Apr/23 ]

robert.walters@mongodb.com

we currently use it in a use case where we write to a MongoDB collection in an append only fashion. (e.g. logs / event-stores etc.)
we then attach a connector to those events and forward them to a kafka topic from where we consume via the consumer API from within different micro services. these micro services process the events and write the data along with the offset within a transaction to the database. if the connector would be able to support exactly-once the micro services could rely on the offset and do the deduplication by verifying the offset of a partition rather than doing the deduplication by saving partial data of the events (e.g. IDs) to the database and compare incoming events against the already processed IDs stored and indexed. It would basically make exactly once way easier from a consumer api users' perspective since the storing and comparing offsets per partition is way easier than for example pulling off the deduplication using a capped collection in MongoDB (assuming the particular micro service is using MongoDB). Depending on the amount of partitions it might also be hard to keep the IDs in memory which comes with cache invalidation complexity. This burden is immediately relaxed if we can rely on the offset. 

Comment by Robert Walters [ 04/Apr/23 ]

robin.fehr@dormakaba.com Can you provide your use case where you would need exactly once ?  Would this be for a Mongo to Mongo replication scenario or ? 

Generated at Thu Feb 08 09:06:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.