Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 1.3.0
Affects Version/s: 1.2.0
Component/s: Source
Labels:
None
Environment:
Kafka Connector: 1.2.0
MongoDb version: 3.6.17

Documentation Changes:
Needed
Documentation Changes Summary:

Hide

Added a new configuration:

copy.existing.pipeline=[{"$match": {"closed": "false"}}]

An inline JSON array with objects describing the pipeline operations to run when copying existing data. This can improve the use of indexes by the copying manager and make copying more efficient.

Use if there is any filtering of collection data in the `pipeline` configuration to speed up the copying process

Show
Added a new configuration: copy.existing.pipeline= [{"$match": {"closed": "false"}}] An inline JSON array with objects describing the pipeline operations to run when copying existing data. This can improve the use of indexes by the copying manager and make copying more efficient. Use if there is any filtering of collection data in the `pipeline` configuration to speed up the copying process

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

We are trying to do copy existing data in huge collections(around 6 million documents). our requirement is such that we need a specific set of data and not all data. so in the configuration, we provide pipeline similar to:

"pipeline": "[
  { $project: { "updateDescription":0 } }, 
  { $match: {"fullDocument.createdDate":{ "$gt": ISODate("2019-03-31T13:44:54.791Z"), "$lt": ISODate("2020-07-23T13:44:54.791Z")} } } 
]".

Mongodb logs show the lookup seems to be very expensive. From the connector code, it looks up the entire collection and applies the filter https://github.com/mongodb/mongo-kafka/blob/master/src/main/java/com/mongodb/kafka/connect/source/MongoCopyDataManager.java#L147 The pipeline configuration is added at the end so it looks up the entire collection and applies the data. Is there an option or a way to add the provided pipeline configuration at the beginning of the list.

Also, please provide us other configuration option available to make the copy data effective. Thanks

is duplicated by

KAFKA-150 When copy exist, support config pipeline before default replaceRoot pipeline

Closed

Assignee:: Ross Lawley
Reporter:: Sabari Gandhi
Votes:: 1 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Aug 03 2020 07:57:42 PM UTC
Updated:: Oct 28 2023 10:46:03 AM UTC
Resolved:: Sep 08 2020 03:49:53 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates