Details
Description
We are trying to do copy existing data in huge collections(around 6 million documents). our requirement is such that we need a specific set of data and not all data. so in the configuration, we provide pipeline similar to:
"pipeline": "[
|
{ $project: { "updateDescription":0 } },
|
{ $match: {"fullDocument.createdDate":{ "$gt": ISODate("2019-03-31T13:44:54.791Z"), "$lt": ISODate("2020-07-23T13:44:54.791Z")} } }
|
]".
|
Mongodb logs show the lookup seems to be very expensive. From the connector code, it looks up the entire collection and applies the filter https://github.com/mongodb/mongo-kafka/blob/master/src/main/java/com/mongodb/kafka/connect/source/MongoCopyDataManager.java#L147 The pipeline configuration is added at the end so it looks up the entire collection and applies the data. Is there an option or a way to add the provided pipeline configuration at the beginning of the list.
Also, please provide us other configuration option available to make the copy data effective. Thanks
Attachments
Issue Links
- is duplicated by
-
KAFKA-150 When copy exist, support config pipeline before default replaceRoot pipeline
-
- Closed
-