[KAFKA-117] Is it possible to implement copying new collections if pipeline has been changed? Created: 18/Jun/20 Updated: 02/Jun/22 Resolved: 10/Aug/20 |
|
| Status: | Closed |
| Project: | Kafka Connector |
| Component/s: | Source |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Andrey B | Assignee: | Ross Lawley |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Use case workflow: Create connector with config:
After some time update pipeline in connector's config to:
Desired result after restart:
What do you think about it? |
| Comments |
| Comment by Andrey B [ 24/Aug/20 ] |
|
I created a separate ticket for the last question. |
| Comment by Andrey B [ 10/Aug/20 ] |
I agree.
This approach could lead to data gaps.
What do you think about explicitly configuration which collections should be copied? I don't speak about saving state and checking if there are new collections that should be copied. Just config property which defines which collections should be copied at the beginning of work.
Andrey |
| Comment by Ross Lawley [ 10/Aug/20 ] |
|
I think at the moment reconfiguring the connector requires too much state to be stored. If you wish to add a new collection and copy the existing data over the process should be something like: 1) Add a new connector to copy and monitor the new collection That is probably more efficient than running lots of change stream cursors and connectors and allows for the growth of watching and copying new collections. I'm going to close this ticket for now as "Won't Fix" however, should more people require this functionality and comment on this ticket we can always reopen it in the future. Ross |
| Comment by Andrey B [ 08/Aug/20 ] |
|
Hi again, Ross, A little bit more about my case: What do you think about copy.existing.collections or copy.existing.collection.regex parameters?
It will be great to define explicitly to which collections should be copied. |
| Comment by Andrey B [ 30/Jun/20 ] |
|
Hi Ross, thanks for reply
Maybe it's better to add new parameters to connector config, like copy.existing.collections or copy.existing.collection.regex
I thought about some special Kafka topic.
I agree it's a much simpler solution. Andrey |
| Comment by Ross Lawley [ 30/Jun/20 ] |
|
Thanks for the ticket. Due to the pipeline possibly containing any valid pipeline operation, it would be hard to determine if any new collections existed. Also where to keep the metadata about what had already been seen / processed. I think for the level of complexity it would add, registering a new connector instance would potentially be the simplest solution. Ross |