[KAFKA-342] clusterTime field is missing in copy-existing events Created: 18/Nov/22  Updated: 27/Oct/23  Resolved: 06/Dec/22

Status: Closed
Project: Kafka Connector
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alon Prantsipal Assignee: Robert Walters
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Insert events that originate from a copy-existing process don't have a clusterTime field as any other change events that don't originate from copy-existing. 

I think this field should be included in such events for the following reasons:

  1. It's an important metadata that identifies when the event originated and not including it doesn't make much sense in general. The event already has fields that identify it as an event that originated from a copy-existing process so it'll be clear to what event time the value refers.
  1. Without it, it's impossible to have idempotency upstream in case of multiple copy-existing runs.
    For example, if copy-existing sent an insert event for _id=1 and then later additional events might be originated for _id=1 from the change stream or not but if eventually copy-existing process will have to be re-run due to losing the last offset or for any other reason, then it'll impossible to tell which of the two insert events from the copy-existing process is the latest one and whether it's before or after any other events for _id=1 that originated from change streams.
    So currently, in such scenarios, we have to rely on events publishing order which is not a best practice in my opinion.


 Comments   
Comment by PM Bot [ 06/Dec/22 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by Robert Walters [ 21/Nov/22 ]

Hi alonp@accessfintech.com, the copy existing process does not include any metadata for clusterTime.  That said, in the next version of the Kafka Connector we are support start at operation time, basically the ability to start the source connector at a certain time in the oplog.  (setting it to the value 0 will start at the beginning of the oplog).  This option will include the clusterTime as it is reflected in the change stream event metadata.  1.9 should be out by mid-end of December.

 

Generated at Thu Feb 08 09:06:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.