[KAFKA-179] copy.existing does not copy all of the existing documents Created: 03/Dec/20  Updated: 27/Oct/23  Resolved: 22/Dec/20

Status: Closed
Project: Kafka Connector
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Robert Walters Assignee: Ross Lawley
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Given a database and collection with existing data, when the copy.existing flag is set to true, upon initial configuration of the connector only 10-20% of the documents will be sourced from mongodb into the kafka topic.

repro:
1. insert at least 1000 documents into the Stocks.StockData collection
2. configure the kafka source connector as follows:

"tasks.max":"1",
"connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"publish.full.document.only": false,
"copy.existing" : true,
"connection.uri":"mongodb://mongo1:27017,mongo2:27017,mongo3:27017",
"topic.prefix":"stockdata",
"database":"Stocks",
"collection":"StockData"

3. view the messages on the kafka topic, you'll see only about 100 show up even after the connector is left running for a long time



 Comments   
Comment by Backlog - Core Eng Program Management Team [ 22/Dec/20 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by Ross Lawley [ 07/Dec/20 ]

Hi robert.walters,

I've not been able to reproduce this issue locally. My test environment is: https://github.com/rozza/mongo-kafka-docker/tree/k176

In one terminal tab run: ./run.sh to setup the kafka and mongodb environment
In another terminal tab run: ./k176.sh to insert 1000 documents into the test.testCopyExisting namespace, add the connector and then output all messages on the mongo.test.testCopyExisting. Here you can see all 1000 messages are added to the topic.

Can you confirm that the test docker scenario works? Otherwise I'll need mongodb and kafka logs to see if I can replicate it.

Ross

Generated at Thu Feb 08 09:05:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.