[KAFKA-272] Allow connector to only do Copy Existing and no Change Streams Created: 16/Dec/21  Updated: 27/Oct/23  Resolved: 05/Jan/22

Status: Closed
Project: Kafka Connector
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Unknown
Reporter: Daniel Barreto Assignee: Robert Walters
Resolution: Works as Designed Votes: 0
Labels: external-user
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We have to read a lot of historic data (2+ TB) and it is taking too long because of CPU limitations (can't get connector to use all 4 cores), so we are thinking about having multiple instances of the connector to import different portions of the historic data but only one instance to do the change streams but we can't find a clean way to configure this since it seems all instances would automatically switch to the change stream once they are done with their corresponding Copy Existing portion. Thanks for any advice or help you can give us with this.



 Comments   
Comment by Robert Walters [ 05/Jan/22 ]

Resolved by customer

Comment by Daniel Barreto [ 05/Jan/22 ]

Yes, we figured it out. Thank You!

Comment by Robert Walters [ 04/Jan/22 ]

daniel@haystack.tv You can configure multiple instances of the connector each with their own pipeline using copy.existing filter that would effectively take a portion of the data.  https://docs.mongodb.com/kafka-connector/master/source-connector/usage-examples/copy-existing-data/#filter-data.  E.g. connector 1 could have a filter where state='ny', connector 2 where state='ma', etc.. that kind of setup.  Im not sure if your data can be broken up this way but this is one option.  Would this work for you?

Comment by Esha Bhargava [ 21/Dec/21 ]

daniel@haystack.tv Thank you for reporting the issue! We'll look into it and get back to you soon.

Generated at Thu Feb 08 09:05:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.