Details
-
Improvement
-
Resolution: Gone away
-
Major - P3
-
None
-
None
-
None
-
None
Description
Insert events that originate from a copy-existing process don't have a clusterTime field as any other change events that don't originate from copy-existing.
I think this field should be included in such events for the following reasons:
- It's an important metadata that identifies when the event originated and not including it doesn't make much sense in general. The event already has fields that identify it as an event that originated from a copy-existing process so it'll be clear to what event time the value refers.
- Without it, it's impossible to have idempotency upstream in case of multiple copy-existing runs.
For example, if copy-existing sent an insert event for _id=1 and then later additional events might be originated for _id=1 from the change stream or not but if eventually copy-existing process will have to be re-run due to losing the last offset or for any other reason, then it'll impossible to tell which of the two insert events from the copy-existing process is the latest one and whether it's before or after any other events for _id=1 that originated from change streams.
So currently, in such scenarios, we have to rely on events publishing order which is not a best practice in my opinion.